1a. Quick Start (submit from a DoC Lab PC)
Open a Terminal window (Ubuntu/macOS, Windows 10 use Powershell built-in ssh and execute the following commands:
# Log in to a 'head' node/submission host
ssh gpucluster2.doc.ic.ac.uk
# or ssh gpucluster3.doc.ic.ac.uk
# Use sbatch to submit a pre-existing script to a remote GPU node
sbatch /vol/bitbucket/shared/slurmseg.sh
The output will be stored, by default, in your ~/ home directory, with the filename slurm20-{xyz}.out (or from whichever directory the sbatch command is launched from)
If you have a bash script ready, replace /vol/bitbucket/shared/slurmseg.sh with the full path to your own script
In summary, log in to a head node to submit scripts to execute remotely on a GPU server/node
Example output of slurm20-xyz.out:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.129.06 Driver Version: 470.129.06 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA A30 On | 00000000:02:00.0 Off | 0 |
| N/A 32C P0 30W / 165W | 0MiB / 24258MiB | 0% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
# Hint: Lab PCs gpu[10-18] have the Slurm client software installed:
ssh gpu10 # (or gpu11, etc., up to gpu18)
nvidia-smi
#this returns the local PC's Nvidia GPU details
srun nvidia-smi
# submits a job to the remote cluster and returns the Nvidia card of the next available GPU cluster node
sbatch /vol/bitbucket/shared/slurmseg.sh
# the usual test script to get started