Gulf
The gulf cluster of SciClone contains six nodes. Two of these nodes are to be used for small jobs and data analysis. The other four nodes contain 2x Nvidia A40 GPUs and can be like other cluster nodes. The front-end for the cluster is gulf.sciclone.wm.edu and the startup module file should is named .cshrc.gulf
Hardware
Front-end (gulf/gu00) |
Anaylysis nodes (gu01-gu02) |
GPU nodes (gu03-gu06) |
||
---|---|---|---|---|
Model | Dell PowerEdge R7525 | |||
Processor(s) |
2×16-core AMD EPYC 7313P |
16-core AMD EPYC 7313P |
2 x 16-core AMD EPYC 7313 |
|
Clock speed |
3.0 GHz |
3.0 GHz |
3.0 GHz |
|
GPU |
-- |
-- |
2x Nvidia A40 |
|
Memory | 128 B | 512 GB | 128 GB | |
Network interfaces |
Application |
HDR IB (gu00-ib) |
HDR IB (gu??-ib) |
HDR IB (gu??-ib) |
System |
10 GbE (gu00) |
1 GbE (gu??) |
1 GbE (gu??) |
|
OS | Rocky Linux 9.3 |
Slurm
The SLURM batch system is used on gulf to run jobs. The maximum walltime for all jobs is 72hrs.
To run an interactive job on gulf using 2 cores and 30 GB of memory on one of the analysis nodes for one hour:
salloc -N 1 -n 2 --mem=30G -t 1:00:00
To run an interactive job on gulf using 16 cores and an 1x A40 GPU
salloc -N 1 -n 16 --gpus=1 -t 12:00:00
NOTE: The specification of memory is not required but can be helpful to ensure that you get the amount of memory you request.
NOTE: The inclusion "-C anl" is to ensure that if no GPU is required, you run on one of the CPU only nodes. This should be the default behavior, but addition of "-C anl" will force this job onto one of the analysis nodes.
The --gpus=N can be added to your srun command (to run a job directly from the command line) or in your batch script, i.e.:
#SBATCH --gpus=2
Currently, there are no reserved slots for debugging. Please send email to [hpc-help] if you need help obtaining resources on this subcluster.
User Environment
To login, use SSH from any host on the William & Mary or VIMS networks and connect to gulf.sciclone.wm.edu with your HPC username (usually the same as your WMuserid) and W&M password.
Your home directory on Gulf is the same as everywhere else on SciClone, and all of the usual filesystems (/sciclone/homeXX, /sciclone/dataXX, /sciclone/scrXX, /local/scr, etc.) are available.
SciClone uses Environment Modules (a.k.a Modules) to automatically configure the user's shell environment across multiple computing platforms, as well as to organize the dozens of different software packages which are available on the system. We support tcsh as the primary shell environment for user accounts and applications.
The file which controls startup modules for gulf is .cshrc.gulf. A copy of this file can be found in /usr/local/etc/templates on gulf.sciclone.wm.edu.
Compilers and preferred flags:
The preferred module for compilers on gulf is: intel/compiler-2024.0 which is the Intel OneAPI compiler which contains Fortan, C and C++ compilers as well as MKL libraries. The preferred MPI pacakge is: openmpi-ib/intel-2024.0/4.1.6 which contains openmpi-4.1.6 compiled for the Intel 2024.0 compiler stack. Also, the intel/mpi-2021.11 module is available which contains the latest Intel MPI software, which also works on Gulf. All W&M HPC clusters also GNU compilers available.
One important thing to point out is that the Intel OneAPI compiler has changed their compiler names from ifort,icc, and icpc to ifx,icx and icpx for Fortran, C, and C++ compilers, respectively. Most of the compiler flags are the same as before..
Here are suggested flags for AMD Zen 4 cpus for the Intel OneAPI 2024.0 compiler as well as GCC:
Intel | C | icx -O3 -axCORE-AVX2 |
---|---|---|
C++ | icpx -std=c11 -O3 -axCORE-AVX2 | |
Fortran | ifx -O3 -axCORE-AVX2 | |
GNU | C | gcc -march=znver3 -O3 |
C++ | g++ -std=c11 -march=znver3 -O3 | |
Fortran | gfortran -march=znver3 -O3 |
Preferred filesystems
The Gulf cluster has a new Lustre scratch filesystem mounted as /sciclone/scr-lst/$USER and should be the preferred filesystem for I/O on this cluster. Each gulf node also has a 1.8 TB SSD /local/scr partition which can also be used by jobs.
MPI
All W&M HPC clusters have both Intel MPI and OpenMPI installed. Some older clusters also have Mvapich2. All parallel jobs (shared memory, MPI or hybrid) need to be run through the batch system. The standard way to do this is to use the 'srun' command:
#!/bin/tcsh
#SBATCH --job-name=test
#SBATCH --nodes=2 --ntasks-per-node=64
#SBATCH -t 30:00
srun ./a.out >& LOG
Please see our Slurm page for more help.