Anatomy of a Job script
A job script commonly consists of two parts:
- Scheduler specific options to manage the resources configure the job environment
- Job-specific shell commands (configuring software environment, specifying your binary/executable)
Here is a simple example of how a job script looks like:
#!/bin/bash
SBATCH --job-name="Hello World"
SBATCH --partition=peregrine-cpu
SBATCH --qos=cpu_debug
SBATCH --nodes=1
SBATCH --ntasks=1
SBATCH --cpus-per-task=1
SBATCH --mem-per-cpu=2G
SBATCH --time=00:01:00
SBATCH --output=my-job.out
SBATCH --error=my-job.err
SBATCH --mail-type=begin
SBATCH --mail-type=end
SBATCH --mail-user=<First.Last>@colostate.edu
module load python/anaconda
srun python3 hello_world.pyThe short description of each command follows in the table below and more details of each command follow after the table.
| Command | Description |
|---|---|
#!/bin/bash | Specifies the Unix shell to be used |
SBATCH --job-name="Hello World" | A name for your job |
SBATCH --partition=peregrine-cpu | Partition to which job should be submitted |
SBATCH --qos=cpu_debug | QoS type |
SBATCH --nodes=1 | Node count |
SBATCH --ntasks=1 | Total number of tasks across all nodes |
SBATCH --cpus-per-task=1 | Cpu-cores per task (>1 if multi-threaded tasks) |
SBATCH --mem-per-cpu=2G | Memory per cpu-core |
SBATCH --time=00:01:00 | Total run time limit (HH:MM:SS) |
SBATCH --output=my-job.out | Output log file |
SBATCH --error=my-job.err | Error file |
SBATCH --mail-type=begin | Send email when job begins |
SBATCH --mail-type=end | Send email when job ends |
SBATCH --mail-user=@colostate.edu | Email address to be notified |
module load python/anaconda | Loads the latest Anaconda module |
| srun python3 hello_world.py | Run the slurm job |
In the scheduler section (that commands that begin with SBATCH), one specifies a series of SBATCH directives which set the resource requirements and other parameters of the job. The above example is a short running CPU job, as it is submitted to the peregrine-cpu partition and the QoS requested is cpu_debug.
Warning
Specifying a QoS is mandatory.
The script above requests 1 CPU-core (via --cpus-per-task=1) and 2 GB of memory (--mem-per-cpu=2G) and a wall time of 1 minute (--time=00:01:00).
Then we specify where the output and error messages get written, using the --output= and --error=. In this example, we are writing the output to a file named my-job.out and the error messages to my-job.err. If you do not specify the output files, SLURM writes both stdout and stderr to a file named slurm-XXXX.out, where XXXX is the job id.
You can specify an email address (--mail-user=) and the events for which you would like to be notified. In the above example, we have specified to alert us at the beginning and at the end of the job.
In the second section, we have the job specific commands. Any environment modules that are needed for utilizing software should be loaded at this stage. Just like on other CS machines, we use environment modules to use software available under /usr/local. More information on using environment modules can be found on the Environment Modules page. Here we are loading the Anaconda Python module, to make use of Python. Note that you may use other modules which provide Python as well.
And finally, the actual work to be done, which in this example is the execution of a Python code, is specified in the final line. The executable (here the Python interpreter) is usually called using the srun slurm command.
Samples
The sub-sections provide example scripts for the following types of jobs:
Serial Jobs
Serial jobs use only a single CPU-core.
First, let us write a simple Python program.
# This program prints Hello, world!
print('Hello, World!')Save it as hello.py.
Now we will write a SLURM script to run our serial Python code as a job:
#!/bin/bash
#SBATCH --job-name="Hello World" # a name for your job
#SBATCH --partition=peregrine-cpu # partition to which job should be submitted
#SBATCH --qos=cpu_debug # qos type
#SBATCH --nodes=1 # node count
#SBATCH --ntasks=1 # total number of tasks across all nodes
#SBATCH --cpus-per-task=1 # cpu-cores per task
#SBATCH --mem-per-cpu=2G # memory per cpu-core
#SBATCH --time=00:01:00 # total run time limit (HH:MM:SS)
module purge
module load python/anaconda
srun python3 hello.pySave it as helloworld-python.sh and submit using the command
sbatch helloworld-python.shThe result will be saved in a file named slurm-####.out and should look like
Hello, World!Multithreaded Jobs
Many modern software like Matlab, NumPy, etc. come with libraries that are able to use multiple CPU-cores via shared-memory parallel programming techniques like OpenMP or pthreads.
For such applications, one can use the cpus-per-task parameter to tell Slurm to run the job using multiple CPU-cores.
Note that the product of ntasks and cpus-per-task should not be greater than the number of CPU-cores allowed on a partition/QoS.
Warning
Using larger values of cpus-per-task will not magically speed up your job. This leads to wastage of resources and might even cause your job to be assigned a lower priority. So, make sure your application uses multithreading libraries or your code has been explicitly written to use multiple threads.
We provide examples for multithreaded:
- MATLAB
- OpenMP
- Python
MATLAB
MATLAB jobs work well as serial (single-threaded) jobs. But if your application/code uses MATLAB’s Parallel Computing Toolbox (e.g., parfor) or MATLAB’s BLAS libraries, then you can script your jobs to run over multiple CPUs.
Warning
At present multi-node MATLAB jobs are not possible. So, your Slurm script should always use #SBATCH –nodes=1.
Here we take the example from MathWorks website which uses multiple cores in a for loop.
for_loop.m
poolobj = parpool;
fprintf('Number of workers: %g\n', poolobj.NumWorkers);
tic
n = 200;
A = 500;
a = zeros(n);
parfor i = 1:n
a(i) = max(abs(eig(rand(A))));
end
tocWe then use the following SLURM script to run the above MATLAB code via the scheduler.
#!/bin/bash
#
#SBATCH --job-name="Matlab" # a name for your job
#SBATCH --partition=peregrine-cpu # partition to which job should be submitted
#SBATCH --qos=cpu_debug # qos type
#SBATCH --nodes=1 # node count
#SBATCH --ntasks=1 # total number of tasks across all nodes
#SBATCH --cpus-per-task=4 # cpu-cores per task
#SBATCH --mem-per-cpu=4G # memory per cpu-core
module purge
module load matlab
matlab -nodisplay -nosplash -r for_loopSave the script as matlab.sh and submit it as
sbatch matlab.shThe output would go to a file slurm-######.out, named after the job id.
It should look like:
< M A T L A B (R) >
Copyright 1984-2022 The MathWorks, Inc.
R2022a (9.12.0.1884302) 64-bit (glnxa64)
February 16, 2022
To get started, type doc.
For product information, visit www.mathworks.com.
Starting parallel pool (parpool) using the 'local' profile ...
Connected to the parallel pool (number of workers: 4).
Number of workers: 4
Elapsed time is 9.388137 seconds.Notice the time taken to finished the task in the last line. You can change the value of cpus-per-task and see how this time changes!
OpenMP
In this example, we will use a simple OpenMP C++ program and run it via SLURM. Here is the C++ code we will be using:
#include <iostream>
#include <omp.h>
int main(int argc, char* argv[]) {
using namespace std;
#pragma omp parallel
{
int id = omp_get_thread_num();
int numthrds = omp_get_num_threads();
cout << "Hello from thread " << id << " of " << numthrds << endl;
}
return 0;
}Save the code as omp.cpp.
Now compile it into a binary named omp using g++:
g++ -fopenmp -o omp omp.cppWe will now run the binary omp using the following SLURM script
#!/bin/bash
#
#SBATCH --job-name="Hello World OMP" # a name for your job
#SBATCH --partition=peregrine-cpu # partition to which job should be submitted
#SBATCH --qos=cpu_debug # qos type
#SBATCH --nodes=1 # node count
#SBATCH --ntasks=1 # total number of tasks across all nodes
#SBATCH --cpus-per-task=8 # cpu-cores per task
#SBATCH --mem-per-cpu=2G # memory per cpu-core
#SBATCH --time=00:01:00 # total run time limit (HH:MM:SS)
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
module purge
./ompNote how we are using cpus-per-task as 8, so we are using 8 CPU cores.
Save the script as omp.sh and submit the job by running
Note how we are using cpus-per-task as 8, so we are using 8 CPU cores.
Save the script as omp.sh and submit the job by running
sbatch omp.shThe output should go in a file named slurm-####.out and should look like
Hello from thread Hello from thread Hello from thread 6 of 8Hello from thread Hello from thread
0 of 8
4 of 8
53 of 8
Hello from thread 1 of 8
Hello from thread 2 of 8
Hello from thread 7 of 8
of 8Python
In this example we’ll use the numpy library in Python to demonstrate a multi-threaded Python job.
Save the following code as numpy-demo.py
import os
num_threads = int(os.environ['SLURM_CPUS_PER_TASK'])
import mkl
mkl.set_num_threads(num_threads)
N = 2000
num_runs = 5
import numpy as np
np.random.seed(42)
from time import perf_counter
x = np.random.randn(N, N).astype(np.float64)
times = []
for _ in range(num_runs):
t0 = perf_counter()
u, s, vh = np.linalg.svd(x)
elapsed_time = perf_counter() - t0
times.append(elapsed_time)
print("execution time: ", min(times))
print("threads: ", num_threads)Now save the following SLURM script as numpy-demo.sh
#!/bin/bash
#
#SBATCH --job-name="NumPY Demo" # a name for your job
#SBATCH --partition=peregrine-cpu # partition to which job should be submitted
#SBATCH --qos=cpu_debug # qos type
#SBATCH --nodes=1 # node count
#SBATCH --ntasks=1 # total number of tasks across all nodes
#SBATCH --cpus-per-task=8 # cpu-cores per task
#SBATCH --mem-per-cpu=1G # memory per cpu-core
#SBATCH --time=00:01:00 # total run time limit (HH:MM:SS)
#
module purge
module load python/anaconda
srun python numpy-demo.pyThen submit the job as
sbatch numpy-demo.shThe output should be in a file name slurm-####.out and should look like:
execution time: 1.5503690890036523
threads: 8MPI Jobs
MPI (Message Passing Interface) utlizes node based parallelism, a MPI enabled code can use multiple CPU-cores on multiple nodes. Here is the C++ code we will be using:
#include <iostream>
#include <mpi.h>
int main(int argc, char** argv) {
using namespace std;
MPI_Init(&argc, &argv);
int world_size, world_rank;
MPI_Comm_size(MPI_COMM_WORLD, &world_size);
MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
// Get the name of the processor
char processor_name[MPI_MAX_PROCESSOR_NAME];
int name_len;
MPI_Get_processor_name(processor_name, &name_len);
// Print off a hello world message
cout << "Process " << world_rank << " of " << world_size
<< " says hello from " << processor_name << endl;
// uncomment next line to make CPU-cores work (infinitely)
// while (true) {};
MPI_Finalize();
return 0;
}Save the code as mpi.cpp.
Now compile it into a binary named mpi using the MPI compiler:
module load compilers/mpi/openmpi-slurm
mpicxx -o mpi mpi.cppWe will now run the binary mpi using the following SLURM script
#!/bin/bash
#
#SBATCH --job-name="MPI Demo" # name of your job
#SBATCH --partition=peregrine-cpu # partition to which job should be submitted
#SBATCH --qos=cpu_short # qos type
#SBATCH --nodes=2 # node count
#SBATCH --ntasks=16 # total number of tasks across all nodes
#SBATCH --cpus-per-task=1 # cpu-cores per task
#SBATCH --mem-per-cpu=1G # memory per cpu-core
#SBATCH --time=00:01:00 # total run time limit (HH:MM:SS)
#
module purge
module load compilers/mpi/openmpi-slurm
srun ./mpiSubmit the job as
sbatch mpi.shThe result will be saved in a file named slurm-####.out and should look like
Process 15 of 16 says hello from peregrine1
Process 1 of 16 says hello from peregrine0
Process 2 of 16 says hello from peregrine0
Process 3 of 16 says hello from peregrine0
Process 4 of 16 says hello from peregrine0
Process 5 of 16 says hello from peregrine0
Process 6 of 16 says hello from peregrine0
Process 7 of 16 says hello from peregrine0
Process 8 of 16 says hello from peregrine0
Process 9 of 16 says hello from peregrine0
Process 10 of 16 says hello from peregrine0
Process 11 of 16 says hello from peregrine0
Process 12 of 16 says hello from peregrine0
Process 13 of 16 says hello from peregrine0
Process 0 of 16 says hello from peregrine0
Process 14 of 16 says hello from peregrine1Hybrid Jobs
One can combine multithreading and multinode parallelism using a hybrid OpenMP/MPI approach. Let use the following C++ code, which uses both MPI and OMP:
#include <iostream>
#include <mpi.h>
#include <omp.h>
int main(int argc, char** argv) {
using namespace std;
MPI_Init(&argc, &argv);
int world_size, world_rank;
MPI_Comm_size(MPI_COMM_WORLD, &world_size);
MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
// Get the name of the processor
char processor_name[MPI_MAX_PROCESSOR_NAME];
int name_len;
MPI_Get_processor_name(processor_name, &name_len);
#pragma omp parallel
{
int id = omp_get_thread_num();
int nthrds = omp_get_num_threads();
cout << "Hello from thread " << id << " of " << nthrds
<< " on MPI process " << world_rank << " of " << world_size
<< " on node " << processor_name << endl;
}
MPI_Finalize();
return 0;
}Save it as hybrid.cpp and compile it via the command
module load compilers/mpi/openmpi-slurm
mpicxx -fopenmp -o hybrid hybrid.cppBelow is a SLURM job script for our code:
#!/bin/bash
#
#SBATCH --job-name="Hybrid Demo" # a name for your job
#SBATCH --partition=peregrine-cpu # partition to which job should be submitted
#SBATCH --qos=cpu_debug # qos type
#SBATCH --nodes=2 # node count
#SBATCH --ntasks-per-node=2 # total number of tasks per node
#SBATCH --cpus-per-task=4 # cpu-cores per task
#SBATCH --mem-per-cpu=1G # memory per cpu-core
#SBATCH --time=00:01:00 # total run time limit (HH:MM:SS)
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
module purge
module load compilers/mpi/openmpi-slurm
srun ./hybridNotice how we ask for two nodes, and 2 tasks per node, and 4 cpus-per-task. Our code will be running over two nodes now. Save the script as hybrid.sh and submit it as
sbatch hybrid.shThe result will be saved in a file named slurm-####.out and should look like
Hello from thread Hello from thread 0 of 4 on MPI process 1 of 4 on node peregrine03 of 4 on MPI process 1 of 4 on node peregrine0
Hello from thread 2 of 4 on MPI process 1 of 4 on node peregrine0
Hello from thread Hello from thread 3 of 4 on MPI process 3 of 4 on node peregrine1Hello from thread 1 of 4 on MPI process 3 of 4 on node 2peregrine1 of 4 on MPI process 3 of 4 on node peregrine1
Hello from thread 1 of 4 on MPI process 0 of 4 on node peregrine0
Hello from thread 1 of 4 on MPI process 1 of 4 on node peregrine0
Hello from thread 2 of 4 on MPI process 0 of Hello from thread 4 on node peregrine0
Hello from thread 0 of 4 on MPI process 0 of 4 on node peregrine0
3 of 4 on MPI process 0 of 4 on node peregrine0
Hello from thread Hello from thread 3 of 4 on MPI process 2 of 4 on node peregrine11 of 4 on MPI process 2 of 4 on node peregrine1
Hello from thread 2 of 4 on MPI process 2 of 4 on node peregrine1
Hello from thread 0 of 4 on MPI process 3 of 4 on node peregrine1
Hello from thread 0 of 4 on MPI process 2 of 4 on node peregrine1GPU Jobs
GPUs are available on the peregrine and kestrel nodes through the peregrine-gpu and kestrel-gpu partitions. There are three types of GPUs on these nodes:
peregrine-gpu
- Nvidia A100 80GB – 6 available
- Nvidia A100 40GB – 4 available
kestrel-gpu
- Nvidia GeForce RTX 3090 24GB – 12 available
How to use GPUs
To use GPUs in your SLURM job:
- Add an additional SBATCH statement: #SBATCH –gres=gpu:<type>:<number_of_gpus> to your job script.
- For A100 80GB, use
#SBATCH --gres=gpu:a100-sxm4-80gb:1 - For A100 40GB, use
#SBATCH --gres=gpu:nvidia_a100_3g.40gb:1 - For RTX 3090 24GB, use
#SBATCH --gres=gpu:3090:1
- For A100 80GB, use
- Submit to the
peregrine-gpupartition for A100s orkestrel-gpupartition for the 3090s. - Note that the number at the end of the SBATCH statement is the quantity of GPUs. In the statements above, we have requested for 1 GPU.
Warning
Adding the –gres option to a Slurm script for a CPU-only code WILL NOT magically speed-up your code.
Only software/code that has been explicitly written to run on GPUs can benefit from GPUs.
Requesting a GPU for a CPU-only code will waste resources and might as well lower down the priority of your future jobs.
Info
The GPU type must be specified in the SLURM script.
It is not possible to mix and match GPU types in a single job.
Warning
Do not ask for multiple GPUs if your codes is only written to use a single GPU.
Doing so will waste resources and might as well lower down the priority of your future jobs.
CPU-GPU ratio
On the peregrine nodes, the ratio of CPUs to GPUs is 6:1. So, your job can request 6 CPU cores for 1 GPU.
Monitor GPU Usage
After you submit your GPU job via sbatch command, you can monitor the GPU usage to check the memory usage of one or more GPUs in your job. Use the following command to get GPU usage of your job:
sgpu <your-jobid-here>The above command runs within your job’s resource allocation. Though the resources required for this task are not too high, and should not impact your job performance, it is recommened to use this on an “as needed” basis, and not in a script which runs it in a loop.
CUDA
Here is the CUDA code we will be using. It defines two vectors and adds them.
#include "stdio.h"
#include <sys/time.h>
#include <cuda.h>
#define N 1000
__global__ void add(int *a, int *b, int *c)
{
int tID = blockIdx.x;
if (tID < N)
{
c[tID] = a[tID] + b[tID];
}
}
int main()
{
int a[N], b[N], c[N];
int *dev_a, *dev_b, *dev_c;
cudaMalloc((void **) &dev_a, N*sizeof(int));
cudaMalloc((void **) &dev_b, N*sizeof(int));
cudaMalloc((void **) &dev_c, N*sizeof(int));
// Fill Arrays
for (int i = 0; i < N; i++)
{
a[i] = i,
b[i] = 1;
}
cudaMemcpy(dev_a, a, N*sizeof(int), cudaMemcpyHostToDevice);
cudaMemcpy(dev_b, b, N*sizeof(int), cudaMemcpyHostToDevice);
add<<<N,1>>>(dev_a, dev_b, dev_c);
cudaMemcpy(c, dev_c, N*sizeof(int), cudaMemcpyDeviceToHost);
for (int i = 0; i < N; i++)
{
printf("%d + %d = %d\n", a[i], b[i], c[i]);
}
return 0;
}Save the code as vector-add.cu.
We will now compile and run the code using the following SLURM script cuda.sh:
#!/bin/bash
#SBATCH --job-name=cuda-add # job name
#SBATCH --partition=peregrine-gpu # partition to which job should be submitted
#SBATCH --qos=gpu_debug # qos type
#SBATCH --nodes=1 # node count
#SBATCH --ntasks=1 # total number of tasks across all nodes
#SBATCH --cpus-per-task=1 # cpu-cores per task
#SBATCH --mem=4G # total memory per node
#SBATCH --gres=gpu:nvidia_a100_3g.40gb:1 # Request 1 GPU (A100 40GB)
#SBATCH --time=00:05:00 # wall time
module load cuda
nvcc vector-add.cu -o vector-add
srun vector-addSubmit the job as
sbatch cuda.shThe result will be saved in a file named slurm-####.out and should look like
0 + 1 = 1
1 + 1 = 2
2 + 1 = 3
3 + 1 = 4
4 + 1 = 5
5 + 1 = 6
6 + 1 = 7
---------
---------
996 + 1 = 997
997 + 1 = 998
998 + 1 = 999
999 + 1 = 1000MatLab
MATLAB has some built-in routines that can take advantage of a GPU. The sample code below performs a matrix decomposition using MATLAB GPU functions.
gpu = gpuDevice();
fprintf('Using a %s GPU.\n', gpu.Name);
disp(gpuDevice);
X = gpuArray([1 0 2; -1 5 0; 0 3 -9]);
whos X;
[U,S,V] = svd(X)
fprintf('trace(S): %f\n', trace(S))
quit;Save the code below as svd.m.
We will now use the following SLURM script matlab-gpu.sh to run the code:
#!/bin/bash
#SBATCH --job-name="Matlab-GPU-Demo" # job name
#SBATCH --partition=peregrine-gpu # partition to which job should be submitted
#SBATCH --qos=gpu_debug # qos type
#SBATCH --nodes=1 # node count
#SBATCH --ntasks=1 # total number of tasks across all nodes
#SBATCH --cpus-per-task=1 # cpu-cores per task
#SBATCH --mem=4G # total memory per node
#SBATCH --gres=gpu:nvidia_a100_3g.40gb:1 # Request 1 GPU (A100 40GB)
#SBATCH --time=00:05:00 # wall time
module purge
module load matlab
matlab -singleCompThread -nodisplay -nosplash -r svdSubmit the job as
sbatch matlab-gpu.shThe result will be saved in a file named slurm-####.out and should look like
< M A T L A B (R) >
Copyright 1984-2022 The MathWorks, Inc.
R2022a (9.12.0.1884302) 64-bit (glnxa64)
February 16, 2022
To get started, type doc.
For product information, visit www.mathworks.com.
Using a NVIDIA A100-SXM4-80GB MIG 3g.40gb GPU.
CUDADevice with properties:
Name: 'NVIDIA A100-SXM4-80GB MIG 3g.40gb'
Index: 1
ComputeCapability: '8.0'
SupportsDouble: 1
DriverVersion: 11.7000
ToolkitVersion: 11.2000
-----------------------------------------------------------
-------------------TRUNCATED-------------------------------
-----------------------------------------------------------
V =
0.0403 0.1761 -0.9835
-0.3974 -0.9003 -0.1775
0.9168 -0.3980 -0.0337
trace(S): 15.718392PyTorch
In this example, we’ll use the PyTorch MNSIT example. Get the source code from https://github.com/pytorch/examples/tree/main/mnist and save the Python code as mnist.py
We will now use the following SLURM script pytorch-gpu.sh to run the code:
#!/bin/bash
#SBATCH --job-name="PyTorch-GPU-Demo" # job name
#SBATCH --partition=peregrine-gpu # partition to which job should be submitted
#SBATCH --qos=gpu_debug # qos type
#SBATCH --nodes=1 # node count
#SBATCH --ntasks=1 # total number of tasks across all nodes
#SBATCH --cpus-per-task=1 # cpu-cores per task
#SBATCH --mem=4G # total memory per node
#SBATCH --gres=gpu:nvidia_a100_3g.40gb:1 # Request 1 GPU (A100 40GB)
#SBATCH --time=00:05:00 # wall time
module purge
module load python/anaconda
python mnist.py --epochs=3Submit the job as
sbatch pytorch-gpu.shThe result will be saved in a file named slurm-####.out and should look like
Train Epoch: 1 [0/60000 (0%)] Loss: 2.299824
Train Epoch: 1 [640/60000 (1%)] Loss: 1.733667
Train Epoch: 1 [1280/60000 (2%)] Loss: 0.933156
Train Epoch: 1 [1920/60000 (3%)] Loss: 0.623502
Train Epoch: 1 [2560/60000 (4%)] Loss: 0.357575
Train Epoch: 1 [3200/60000 (5%)] Loss: 0.315663
-----------------------------------------------------------
-------------------TRUNCATED-------------------------------
-----------------------------------------------------------
Train Epoch: 3 [55680/60000 (93%)] Loss: 0.009016
Train Epoch: 3 [56320/60000 (94%)] Loss: 0.241464
Train Epoch: 3 [56960/60000 (95%)] Loss: 0.004863
Train Epoch: 3 [57600/60000 (96%)] Loss: 0.004337
Train Epoch: 3 [58240/60000 (97%)] Loss: 0.109445
Train Epoch: 3 [58880/60000 (98%)] Loss: 0.038164
Train Epoch: 3 [59520/60000 (99%)] Loss: 0.014446
Test set: Average loss: 0.0333, Accuracy: 9887/10000 (99%)TensorFlow
In this example, we’ll use a small example along the lines of https://www.tensorflow.org/tutorials/keras/classification Save the following Python code as mnist.py
import tensorflow as tf
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
model = tf.keras.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dense(10)
])
model.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
model.fit(x_train, y_train, epochs=10)
test_loss, test_acc = model.evaluate(x_test, y_test, verbose=2)We will now use the following SLURM script tf-gpu.sh to run the code:
#!/bin/bash
#SBATCH --job-name="TensorFlow-GPU-Demo" # job name
#SBATCH --partition=peregrine-gpu # partition to which job should be submitted
#SBATCH --qos=gpu_debug # qos type
#SBATCH --nodes=1 # node count
#SBATCH --ntasks=1 # total number of tasks across all nodes
#SBATCH --cpus-per-task=1 # cpu-cores per task
#SBATCH --mem=4G # total memory per node
#SBATCH --gres=gpu:nvidia_a100_3g.40gb:1 # Request 1 GPU (A100 40GB)
#SBATCH --time=00:05:00 # wall time
module purge
module load python/anaconda
python mnist.pySummit the job as
sbatch tf-gpu.shThe result will be saved in a file named slurm-####.out and should look like
Epoch 1/10
1875/1875 [==============================] - 2s 958us/step - loss: 0.2587 - accuracy: 0.9252
Epoch 2/10
1875/1875 [==============================] - 2s 955us/step - loss: 0.1135 - accuracy: 0.9660
Epoch 3/10
1875/1875 [==============================] - 2s 956us/step - loss: 0.0772 - accuracy: 0.9764
-----------------------------------------------------------
-------------------TRUNCATED-------------------------------
-----------------------------------------------------------
1875/1875 [==============================] - 2s 956us/step - loss: 0.0285 - accuracy: 0.9910
Epoch 8/10
1875/1875 [==============================] - 2s 956us/step - loss: 0.0245 - accuracy: 0.9920
Epoch 9/10
1875/1875 [==============================] - 2s 955us/step - loss: 0.0184 - accuracy: 0.9943
Epoch 10/10
1875/1875 [==============================] - 2s 956us/step - loss: 0.0172 - accuracy: 0.9942
313/313 - 0s - loss: 0.0815 - accuracy: 0.9771 - 330ms/epoch - 1ms/step
Test accuracy: 0.9771000146865845Interactive Jobs
Certain applications require direct user input via a terminal. For these one can make use of interactive jobs in SLURM, which makes it possible to run applications/commands on compute nodes in a shell. SLURM offers two ways in which one can run interactive jobs: using the srun command and salloc command.
Warning
Interactive jobs are intended for very short running and very specific applications/commands.
Do not use interactive jobs for long term jobs and for regular applications.
Please use the sbatch command to submit jobs.
SRUN
Using the srun command, interactive jobs can be run within a scheduled shell.
Here is an example:
srun --nodes=1 --ntasks=1 --mem=4G --time=00:05:00 --pty /bin/bashNotice how the prompt changes indicating that a new shell has been spawned on one of the compute nodes:
peregrine0:~$ You can now run your interactive application/command and after you are done, just type exit at the command prompt to quit the shell and delete the SLURM job.
SALLOC
For situations where you would like to come back to your interactive session (after disconnecting from it), you can use SLURM’s salloc command to allocate resource up-front and keep the job running. The process looks like this:
- Use
sallocto create the resource allocation up front - Use srun to connect to it, as many times as needed during the job time frame.
Run the command below to allocate resources:
salloc --nodes=1 --ntasks=1 --mem=4G --time=00:20:00Here we are allocating 4GB of memory and one CPU on a node for 20 minutes. The command will display a job id number. Keep a note of it, as you will need that to connect to the interactive shell.
salloc: Granted job allocation 235
salloc: Waiting for resource configuration
salloc: Nodes peregrine0 are ready for jobNotice this time the prompt did not change. Since salloc only allocates resources to your job, it does not start a shell. To connect to an interactive shell on your job use the srun command and specify the job id (which you noted in the salloc step):
srun --jobid=235 --pty /bin/bashYou will now be landed on a compute node in an interactive shell.
peregrine0:~$Now you can exit from the shell and connect again later, using the srun command with the same job id number. To finally delete your job, use the scancel command.
scancel 235salloc: Job allocation 235 has been revoked.
Hangup