Slurm | Notion

SLURM (Simple Linux Utility for Resource Management) is an open-source workload manager designed for high-performance computing (HPC) clusters. It is responsible for allocating resources to users for their jobs, managing job queues, and scheduling tasks across the cluster. SLURM allows users to submit, monitor, and control jobs efficiently. Key features include job prioritization, resource allocation, job dependencies, and the ability to run parallel tasks across multiple nodes. By submitting Slurm job, you will substantially request a session within a particular container with the resources you asked for. There are two modes of submitting jobs to Slurm:

interactive jobs

Interactive jobs allow users to request resources and immediately access a command-line session on the allocated compute node(s). This is useful for debugging, testing, or running applications that require user interaction. To start an interactive job, you use the srun command in the terminal with its input value pairs:

srun --ntasks=<"number of tasks"> \\
--cpus-per-task=<"CPUs per task"> \\
--gpus-per-task=<"GPUs per task"> \\ ##many ways to specify it, click [**here for more info**](<https://handsome-almanac-b58.notion.site/Slurm-8538e8ef6cc64074aa268959f61a86ec>)
--mem=<"memory needed"> \\
--time=<"time limit (e.g., HH:MM:SS)"> \\
--partition=<"partition name"> \\
--qos=<"qos name"> \\
--nodelist=<"node name or names"> \\
--job-name=<"job name"> \\
--output=<"output file path"> \\
--error=<"error file path"> \\
--container-image=<"path to container image"> \\
--container-mounts=<"host path1:container path1,host path2:container path2,..."> \\
--container-writable \\
--container-remap-root \\
--container-save=<"path to save container state"> \\
--pty bash

this bring you to the computation node with an open terminal. To be able to attach to this open session via vs code, please look here.

non-interactive jobs

Non-interactive jobs, also known as batch jobs, are the most common way to run jobs on a cluster. These jobs are submitted using the sbatch command along with a job script that specifies the job's resources, commands to execute, and any other necessary parameters. For instance, a script named my_job.sh might contain resource requests and commands to run your application. You would submit this script with sbatch my_job.sh. SLURM then queues the job and runs it when resources become available, without requiring further interaction from the user. An example of a structure of a my_job.sh can be:

#!/bin/bash
#SBATCH --ntasks=<"number of tasks">
#SBATCH --cpus-per-task=<"CPUs per task">
#SBATCH --gpus-per-task=<"GPUs per task">
#SBATCH --mem=<"memory needed">
#SBATCH --time=<"time limit (e.g., HH:MM:SS)">
#SBATCH --partition=<"partition name">
#SBATCH --qos=<"qos name">
#SBATCH --nodelist=<"node name or names">
#SBATCH --job-name=<"job name">
#SBATCH --output=<"output file path">
#SBATCH --error=<"error file path">
#SBATCH --container-image=<"path to container image">
#SBATCH --container-mounts=<"host path1:container path1,host path2:container path2,...">
#SBATCH --container-writable
#SBATCH --container-remap-root
#SBATCH --container-save=<"path to save container state">

<command to run>

Here’s a description of each input used in these commands:

--ntasks=<"number of tasks">:
- Specifies the total number of tasks to be run by the job. Each task typically corresponds to a single process.
--cpus-per-task=<"CPUs per task">:
- Indicates the number of CPU cores allocated to each task. If your tasks are multi-threaded, you should request multiple CPUs per task.
--gpus-per-task=<"GPUs per task">:
- Requests the number of GPUs to allocate to each task. This is important for jobs requiring GPU resources, such as deep learning or computational simulations.
- If using MIGs, have a look at this section
--mem=<"memory needed">:
- Sets the amount of memory required for the job. This can be specified in MB (e.g., 1024M) or GB (e.g., 16G). Ensuring adequate memory is crucial to prevent the job from being killed due to insufficient resources.
--time=<"time limit (e.g., HH:MM:SS)">:
- Defines the maximum wall-clock time that the job is allowed to run. If the job exceeds this time, it will be terminated. Format is HH:MM:SS.
--partition=<"partition name">:
- Selects the partition (or queue) where the job will be submitted. Partitions may be configured for different types of jobs (e.g., GPU jobs, high-memory jobs, etc.).
--qos=<"qos name">:
- Stands for Quality of Service. It adjusts job priority, resource limits, or other parameters based on cluster policies. Different qos levels may allow access to more resources or faster scheduling.
--nodelist=<"node name or names">:
- Specifies a list of nodes on which the job should run. This can be useful if you need to run your job on a specific set of nodes with particular characteristics or availability.
--job-name=<"job name">:
- Assigns a custom name to the job, which can help identify it in the job queue. This is useful for monitoring and organizing jobs.
--output=<"output file path">:
- Defines the path to a file where the standard output (stdout) of the job will be written. This helps in capturing the job’s output for later review.
--error=<"error file path">:
- Defines the path to a file where the standard error (stderr) of the job will be written. This helps in capturing error messages generated during the job’s execution.