Node properties and metaclusters
You can find property/feature names for specifying an individual subcluster on the subcluster pages under the Node Types and Subclusters page. The most up-to-date information on what properties a node has can be queried directly from TORQUE with its pbsnodes
command.
Generally, nodes are classified with properties in at least the following ways:
- subcluster: usually just the subcluster name in lowercase, e.g.
bora
for Bora, andvortexa
for Vortex-α; - operating system:
el6
orel7
for Enterprise Linux (Red Hat or a derivative) 6.x or 7.x, respectively, orrhel6
/rhel7
for Red Hat Enterprise Linux specifically; - processor type:
xeon
for Intel Xeon processors oropteron
for AMD Opteron processors, and more specificallyskylake
,broadwell
,knl
, orx5672
for Xeon processors, orabu_dhabi
,seoul
, ormagny_cours
for Opteron processors; - hardware multithreading:
noht
if Intel's hyper-threading is disabled or nonexistent, and/ornocmt
if AMD's clustered multi-threading is disabled or nonexistent; and - switch/network connectivity:
ib01
,ib02
,ib03
,ib04
,ib05
, orib06
depending on which InfiniBand switch the node is connected to, and/oropa
if the node is connected to SciClone's Omni-Path network.
When subclusters share a common processor technology and communication fabric, the distinctions between them can be ignored for certain applications, and they can be treated as a larger, unified "metacluster." For example, the Hurricane and Whirlwind subclusters employ the exact same number and model of Xeon processors and share the same InfiniBand switch. Overlooking the differences in memory capacities, local scratch disks, and the presence of GPUs in Hurricane, these two subclusters could be treated as a single 64-node system, rather than distinct 12-node and 52-node systems.
For job scheduling purposes, choose a node property specification which is common to all of the nodes of interest and uniquely identifies that set of resources. For example, to combine nodes from the Hurricane and Whirlwind subclusters into a 60-node job, you could use something like
qsub -l nodes=60:x5672:ppn=8 ...
which specifies that you want 8 cores on each of 60 compute nodes equipped with Xeon X5672 processors. Even a job needing 52 or fewer nodes, which could be satisfied by just Whirlwind, will be easier to schedule and therefore will likely run sooner with a more inclusive node specification. Alternatively, if you had a multi-threaded single-node job but wanted to allow it to run on any free node with a Xeon "Broadwell" processor, you might say:
qsub -l nodes=1:broadwell:ppn=20 ...
Some applications, especially pre-compiled ones, will only work on a particular operating system release. To run only on RHEL/CentOS 7 nodes, for example, you could specify:
qsub -l nodes=1:el7:ppn=1 ...
Nodes that qualify for the above specification (e.g. both Hima, from 2017, and Rain, from 2007) vary quite significantly with respect to processor model, processor speed, number of CPU cores per node, etc., but it is still quite possible to use a generic resource request to treat these subclusters as one large pool, particularly for serial jobs.
Finally, if an application is completely agnostic with respect to processor type, number of cores, memory capacity, operating system, etc., you could use a very generic node specification to treat the entire SciClone complex as one big cluster, allowing the job to run anywhere:
qsub -l nodes=1:ppn=1 ...
This strategy works particularly well when you are submitting large numbers of serial jobs using a software package which is supported across all of SciClone's computing platforms (e.g., MATLAB, Sage, Octave, GRASS, NumPy/SciPy, etc.) By putting all of the available computing resources at your disposal, you can reduce total turnaround time and maximize throughput for the entire collection of jobs, especially when the system is busy.