ICARE Computer Cluster

Overview

ICARE's cluster provides computational resources for ICARE users. This page explains how to get an account and use the cluster.

Servers

ICARE's cluster is based on a Torque/Maui build. It is composed of 2 front-end nodes (32-bit and 64-bit) and 3 back-end computing nodes.
The 64-bit cluster is composed of:
  • 1 front-end server (access64.icare.univ-lille1.fr)
  • 2 computing nodes (each has two 6-core Intel Xeon X5660 2.80 GHz processors and 16 GB of memory)
The 32-bit cluster is composed of:
  • 1 front-end server (access32.icare.univ-lille1.fr)
  • 1 computing node (two 4-core Intel Xeon E5430 2.66GHz processors and 8 GB of memory)
The front-end nodes are dedicated to interactive use through an SSH session ("ssh access64.icare.univ-lille1.fr" or "ssh access32.icare.univ-lille1.fr"). From each front-end server, jobs can be submitted to the job scheduler and executed on the computing nodes.

Disk Space

  • Home Directory (~5 TB total)
This space should be used for storing files you want to keep in the long term such as source codes, scripts, etc. Every user has a quota of 50 GB on his/her home directory. The home directory is backed up nightly.
Note: home directories are shared by all nodes of the cluster, so be aware that any modification in your home directory on access32 also modifies your home directory on access64.
  • Main Storage Space /work_users (13 TB total)
This is the main storage space for large amounts of data. This work space is backed up nightly.
  • Scratch Space /scratch (21 TB total)
The scratch filesystem is intended for temporary storage and should be considered volatile. Older files are subject to being automatically purged. No backup of any kind is performed for this work space.

Registration

You need to register to get an account. Please fill the SSH registration form.

Logging in

To use the computer cluster, you have to log in to one of the front-end nodes using ssh. Use either the 64-bit node access64.icare.univ-lille1.fr or the 32-bit node access32.icare.univ-lille1.fr depending on your needs. The environment variables are set automatically (centralized environment and commands /usr/ops/env64_rhel6 and /usr/ops/env32_rhel6 respectively).
These machines act as a gateway to the rest of the cluster. No intensive processing is to be run on front-end nodes. They should be used for interactive needs only. Processing jobs should run to the computing nodes instead (see below).

Running your jobs

Processing jobs should be submitted to a job scheduler ezqsub to run on the computing nodes.
The ezqsub command schedules a job using user directives specified at the command line. It is based on the standard qsub command. It converts the command-line options into the correct PBS directives and calls qsub. These directives control the resources required by the jobs submitted on the cluster.

Usage:
ezqsub [OPTIONS] <command-line>
Args:
 <command-line> full command line containing the executable and the arguments, wrapped by single quotes [REQUIRED]
Options:
 -h, --help            show this help message and exit
 -s SERVER, --server=SERVER
                       Set the type server to 32 or 64 bits. Available are
                       ['cluster32', 'cluster64'] (default : cluster64)
 -w MAX_WALL_TIME, --max_wall_time=MAX_WALL_TIME
                       Maximum duration of the run, a positive integer for
                       seconds or in the format [[HH:]MM:]SS (default :
                       06:00:00). If the run exceed this limit, it will be
                       aborted
 -m MAX_RAM, --max_ram=MAX_RAM
                       A positive integer that sets the maximum amount of RAM
                       required for the run (default : 2gb). It can be
                       followed by b, kb, mb, gb to specify the unit. Without
                       it, the integer is interpreted as bytes. If the run
                       exceed this limit, it will be aborted
 -n NODES, --nb_nodes=NODES
                       Number of nodes required for the run (default : 1).
                       [PARALLEL JOBS ONLY] If yours is not, increasing it
                       will not make the run faster but will penalize the
                       other users
 -p PPN, --nb_process_per_node=PPN
                       Number of process per node required for the run
                       (default : 1). [PARALLEL JOBS ONLY] If yours is not,
                       increasing it will not make the run faster but will
                       penalize the other users
 -e EMAIL, --email=EMAIL
                       You can optionnaly specify your email here. If you do,
                       you will receive an email with the status when the
                       jobs starts, ends and eventually aborts with an error.
                       None are sent by default
 -d PBS_SCRIPT_DIR, --pbs_script_directory=PBS_SCRIPT_DIR
                       Directory where will be stored the script that will be
                       submit to the batch system. It must available either
                       on the frontend and on the cluster. Default is
                       /work_users/$HOME. WARNING : you must remove it by
                       yourself once the process is done
 -o STDOUT_DIR, --std_output_directory=STDOUT_DIR
                       Directory where will be stored the standard output
                       file (stdout and stderr are merged in it). Default is
                       /work_users/$HOME
Submit the job to the scheduler:
 [ops@access64]$ [ops@access64 ~]$ ezqsub -s cluster64 -e John.Doe@gmail.com "matlab -nosplash -nodesktop" 
 PBS script : /work_users/ops/matlab_2014-01-30T09:17:42_FAYyoG.pbs 
 3672.access64.icare.univ-lille1.fr 
Note: if you request too many resources, your job may stay queued indefinitely. In particular, make sure you don't require more nodes, processes per node, or memory, than available on the cluster...
Once your job execution completes, you will find two files in the directory you submitted the job from. They contain stdout (.o) and stderr (.e).

Commands

  • qstat
qstat tells you the status of your and other people's jobs ('R' is for running and 'Q' is for queued).
[ops@access64 ~]$ qstat 
Job ID                    Name             User            Time Use S Queue 
------------------------- ---------------- --------------- -------- - ----- 
3672.access64              ...42_FAYyoG.pbs ops             00:00:00 R batch  
  • qdel
The qdel program removes your job from the queue, or cancel it if it is running. The syntax for qdel is "qdel ", but you can abbreviate the job ID with just the leading number.
  • pbsnodes
The pbsnodes command is used to list nodes and their status. pbsnodes -a will list status of all the nodes in the cluster pbsnodes -l will list nodes that are not available to the scheduler.
[ops@access64 ~]$ pbsnodes -a 
icare51.icare.univ-lille1.fr 
     state = free 
     np = 18 
     properties = cluster64,bipro,12coeurs,16go,rhels6u2,64b 
     ntype = cluster 
     jobs = 0/3672.access64.icare.univ-lille1.fr 
     status = rectime=1391073620,varattr=,jobs=3672.access64.icare.univ  lille1.fr,state=free,netload=95832296916,gres=,loadave=0.92,ncpus=12,physmem=16316336kb,availmem=31805780kb,totmem=32700328kb,idletime=762071,
nusers=4,nsessions=4,sessions=1749 3771 4358 9717,uname=Linux   
icare51.icare.univ-lille1.fr 2.6.32-220.13.1.el6.x86_64 #1 SMP Thu Mar 29 11:46:40 EDT 2012 x86_64,opsys=linux 
     mom_service_port = 15002 
     mom_manager_port = 15003 
 
icare52.icare.univ-lille1.fr 
     state = free 
     np = 18 
     properties = cluster64,bipro,12coeurs,16go,rhels6u2,64b 
     ntype = cluster 
     status =  rectime=1391073621,varattr=,jobs=,state=free,netload=73401909151,gres=,loadave=0.01,ncpus=12,physmem=16316340kb,availmem=31912280kb,totmem=32700332kb,idletime=762971,nusers=3,nsessions=3,
sessions=1729 3783  4352,uname=Linux icare52.icare.univ-lille1.fr 2.6.32-220.17.1.el6.x86_64 #1 SMP Thu Apr 26 13:37:13 EDT 2012 x86_64,opsys=linux 
     mom_service_port = 15002 
     mom_manager_port = 15003

See Also

Submitting Matlab/IDL jobs via Torque/Maui