1a. Quick Start (submit from a DoC Lab PC)

Open a Terminal window (Ubuntu/macOS, Windows 10 use Powershell built-in ssh and execute the following commands:

# Log in to a 'head' node/submission host
ssh gpucluster2.doc.ic.ac.uk
# or ssh gpucluster3.doc.ic.ac.uk
# Use sbatch to submit a pre-existing script to a remote GPU node
sbatch /vol/bitbucket/shared/slurmseg.sh

The output will be stored, by default, in your ~/ home directory, with the filename slurm20-{xyz}.out (or from whichever directory the sbatch command is launched from)

If you have a bash script ready, replace /vol/bitbucket/shared/slurmseg.sh with the full path to your own script

In summary, log in to a head node to submit scripts to execute remotely on a GPU server/node

Example output of slurm20-xyz.out:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.129.06   Driver Version: 470.129.06   CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A30          On   | 00000000:02:00.0 Off |                    0 |
| N/A   32C    P0    30W / 165W |      0MiB / 24258MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
# Hint: Lab PCs gpu[10-18] have the Slurm client software installed:

ssh gpu10 # (or gpu11, etc., up to gpu18)
nvidia-smi 

#this returns the local PC's Nvidia GPU details

srun nvidia-smi 

# submits a job to the remote cluster and returns the Nvidia card of the next available GPU cluster node

sbatch /vol/bitbucket/shared/slurmseg.sh 

# the usual test script to get started

results matching ""

    No results matching ""