Introduction to using SLURM
This page details the usage of Simple Linux Usage Resource Manager (SLURM) by users to launch jobs on the cluster. Currently, only the Femto, Cyclops, and gust sub-clusters are configured to use SLURM and all the information contained in this section will be limited to these subclusters. To request a job allocation from SLURM at least three pieces of information is required - number (and type) of nodes, number of cores per node and the time required for completion. Depending on the type of parallel computation, other options such as # tasks per node or #cores per task can be used to customize execution. More options are available to fine-tune the execution of jobs and will be discussed in the relevant sub-sections.
For users accustomed to using TORQUE, here are some important differences between TORQUE and SLURM:
1) The Slurm startup environment is different that Torque. By default Slurm uses the environment/modules loaded when the batch script is submitted, Torque gives you a new startup environment with your default modules loaded. To simulate this in Slurm, add: #SBATCH --export=NONE to your batch script.
2) Slurm batch jobs start with the current directory as the directory from which the job was submitted. Torque would always place you in your home directory making you cd to the submission directory.
3) The mvp2run script will not be used on slurm subclusters. The main functionality is superseded by 'srun'. See 'srun -h' for help on options for mpi jobs. Most MPI jobs should be fine using 'srun ./a.out' in your batch script. One other function of mvp2run was to check the load on each node before running the job. This can be done with 'ckload' See 'ckload -h' and the example below.
A quick overview of SLURM commands is given below:
salloc | Requesting job allocation |
srun | Requesting job allocation and executing commands |
sbatch | Submit a batch script for execution |
scancel | Cancel running jobs |
squeue | View queued jobs and job information |