Submitting jobs

Once you have logged in to a front-end server, there are two ways to submit a job: as an interactive job, or in batch mode.

Interactive Job

An interactive job is the preferred method for prototyping, debugging, and testing code. It should be remembered that any disruption in connection (due to inactivity, timeouts, etc.) will exit the job and free the allocated resources. For this reason, any job lasting more than an hour should probably instead use the batch mode described in the next section.

To submit an interactive job, use qsub with the -I flag:

qsub -I -l nodes={# of nodes}:{node features/properties}:ppn={# of processors per node} -l walltime={HH:MM:SS}

If you need X11 forwarding (for graphical applications), add -X.

The interactive job starts a command-line shell on the requested node where the job-related commands/scripts can be executed. For example:

l@muse ~> ssh lsh@vortex.sciclone.wm.edu
Password: 
...
11 [vortex] qsub -I -l nodes=1:vortex:ppn=1 -l walltime=00:05:00
qsub: waiting for job 5207204 to start
qsub: job 5207204 ready

11 [vx01] sleep 5; echo This is technically a job!
This is technically a job!
12 [vx01] exit
logout

qsub: job 5207204 completed
12 [vortex]

Batch Job

Because of the aforementioned fragility of interactive jobs, and because it may be necessary to wait a significant amount of time for resources to become available for a job to start when the cluster is busy, most jobs on the cluster are run in batch mode, that is, completely in the background with no user interaction. Since you will not be present to issue commands, this requires a job script to be submitted to the batch system that contains all the commands you need to run.

The basic format for job submission is

qsub [OPTIONS] SCRIPT

where SCRIPT is the name of a file containing commands you would issue to run your application, and optionally #PBS directives you could also specify as OPTIONS to qsub. The most commonly used are

-l nodes={# of nodes}:{node type}:ppn={# of processors per node}

the resources required for the job;
-l walltime={HH:MM:SS}

the maximum length of time the job will run;
-N {job name}

the job name;
-j oe

join output and error output, instead of splitting them into separate files;
-m abe

when a user is sent mail about the job: a for if job is aborted by batch system, b for when the job begins, and/or e for when the job ends; and
-M user1,user2,...

a comma separated list of email addresses that are notified.

See man qsub for more options and information. In a script, all PBS directives must be mentioned before the execution commands: directives after the first command are ignored.

When batch jobs end, output that would have been written to the screen in an interactive job is instead saved as files in $PBS_O_WORKDIR (the directory you were in when you submitted the job, or specified with -w) named Jobname.oJobID (and Jobname.eJobID, if you did not use -j oe). Using tcsh, such output will always contain

  Warning: no access to tty (Bad file descriptor).
  Thus no job control in this shell.

which is simply tcsh warning you that (because it's running in a batch job) it has no access to a terminal and you will not be able use Ctrl-C, Ctrl-Z, etc.

Note that both interactive and batch jobs start in your home directory, regardless of where they were submitted. If you want to run a command in a different directory, first change directory, e.g.

  cd $PBS_O_WORKDIR
  ./my_command

Examples

A serial job requires one node and runs on a single core. In this example, not having specified nodes=, we are allowing the job scheduler to assign any processor on any machine in the cluster. If this were not acceptable, we could request a particular node feature/property by adding another #PBS -l line, specifying nodes=1:property.

#!/bin/tcsh
#PBS -l walltime=4:00:00
#PBS -N my_serial_job
#PBS -j oe
cd $PBS_O_WORKDIR
/path/to/serial_job

An SMP/shared-memory job runs on a single node using several cores, and uses OpenMP or multithreading.

#!/bin/tcsh
#PBS -l nodes=1:x5672:ppn=8
#PBS -l walltime=12:00:00
#PBS -N my_smp_job
#PBS -j oe
cd $PBS_O_WORKDIR
./omp_matrices_addition

A parallel/distributed memory job runs on multiple nodes with multiple cores using, in most cases, a parallel communication library such as MVAPICH2/OpenMPI. The parallel job script is executed on the first allocated node after the job begins.

On our systems, all MVAPICH2/OpenMPI jobs should be initiated using mvp2run, a wrapper interface between mpirun_rsh/mpiexec and our batch system. It provides functionality for selecting the desired physical network, checking processor loads on the destination nodes, managing execution environment, and process mapping and affinity (for MVAPICH2). See mvp2run -h for more information.

#!/bin/tcsh
#PBS -l nodes=7:vortex:ppn=12
#PBS -l walltime=48:00:00
#PBS -N parallel_fem
#PBS -j oe
cd $PBS_O_WORKDIR
mvp2run -D -c 12 -C 0.2 -e GRIDX=500 -e GRIDY=400 /path/to/code/parallel_fem {args}