6. Connect to a head node to send jobs (sbatch)
gpucluster2.doc.ic.ac.uk is the main controller for the cluster and you submit your compute jobs from gpucluster. The GPU hosts each contain a high-end graphics card – for example, an Nvidia GeForce GTX Titan Xp or an Nvidia Tesla. You cannot access the GPU hosts directly, you instead submit your Slurm jobs through gpucluster and gpucluster schedules your jobs to run on available an available GPU.
Here is an example of the steps involved in submitting a Slurm job on gpucluster
Connect to the slurm controller:
ssh gpucluster.doc.ic.ac.uk
Change to an appropriate directory on gpucluster:
# this directory may already exist after Step 3
mkdir -p /vol/bitbucket/${USER}
cd /vol/bitbucket/${USER}
Now try running an example job. A simple shell-script has been created for this purpose. You can view the file with less, more or view. You can use the sbatch command to submit that shell-script to run it as a Slurm job on a GPU host:
sbatch /vol/bitbucket/shared/slurmseg.sh
If you have composed your own script, in your bitbucket folder, for example, enter:
cd /vol/bitbucket/${USER}
sbatch my_script.sh
Substitute ‘my_script.sh for your actual script name.
You can invoke the squeue command to see information on running jobs:
squeue
The results of sbatch will output to the directory where the command was invoked, eg /vol/bitbucket/${USER}. The filenames will be derived from the invoked command or script – for example:
less slurm-XYZ.out
where XYZ is a unique Slurm job number. Visit the FAQ below to find out how to customise the job output name
Please note: the server gpucluster is not to be used for computation. Please do not attempt to SSH and then run resource-intensive processes on gpucluster itself. The server only has one role:
Allow end-users to submit Slurm jobs to GPU-equipped servers using sbatch.
Note in particular that gpucluster does not have an Nvidia CUDA-capable card in it. This is deliberate. Do not be surprised if you SSH to gpucluster, set up a virtual environment and when you run a test on gpucluster, see an error message similar to the following:
ImportError: libcuda.so.1: cannot open shared object file: No such file or directory
gpucluster2.doc.ic.ac.uk and gpucluster3.doc.ic.ac.uk are head nodes for the GPU cluster, from where you run the sbatch command to send your scripts to the remote GPU host servers.
Here is an example of the steps involved in submitting your script as a Slurm job:
- Connect to a slurm submission host (see step 1b for connecting from your own laptop):
ssh gpucluster2.doc.ic.ac.uk
# or ssh gpucluster3.doc.ic.ac.uk
Change to an appropriate directory on the host (this directory may already exist after Step 2):
mkdir -p /vol/bitbucket/${USER} cd /vol/bitbucket/${USER}Now try submitting an example job. A simple shell script has been created for this purpose. You can view the file with 'less', 'more','nano' or a viewer you prefer. You can use the 'sbatch' command to submit that shell script to run it as a Slurm job on a GPU host:
sbatch /vol/bitbucket/shared/slurmseg.sh
If you have composed your own script, in your bitbucket folder, for example, enter:
cd /vol/bitbucket/${USER}
sbatch /path_to_script/my_script.sh
Substitute '/path_to_script/my_script.sh' for your actual script and path name.
- You can invoke the squeue command to see information on running jobs:
squeue
- The results of sbatch will output to the directory where the command was invoked, eg /vol/bitbucket/${USER} :
less slurm20-XYZ.out
where XYZ is a unique Slurm job number. Visit the FAQ below to find out how to customise the job output name