Parallelism on lightning is obtained by using MPI. All accounts are set up so that MPI uses the very high performance InfiniPath HTX communication network. To use MPI :
* To compile Fortran 77, Fortran 90, and Fortran 95 MPI programs use either
mpif77, mpif90 or mpif95. To compile C and C++ MPI programs use either
mpicc or mpiCC. Note that Pathscale combines their C and C++ compilers
into one compiler. Pathscale also combines Fortran 77, 90, and 95 compilers
into one compiler.
* use http://andrew.ait.iastate.edu/HPC/lightning/lightning_script_writer.html to write
a script to submit to the batch scheduler. Remember that there
are 4 processors per node, so for a 12 processor job, you only
need 3 nodes.
* In the script use mpirun -np 12 ./a.out
mpirun has been modified to use only the nodes that PBS assigns to it,
so no one else will use the nodes on which your MPI job is running.
Only one mpirun may appear, as PBS will not allow a second mpirun to start.
* Make sure that the executable (a.out in the example above) resides in one of
the following locations:
/home/user (where 'user' is your user name)
/work/group (where 'group' is your group name, issue 'groups' to find it out)
/ptmp
All these locations are mounted on each of the compute nodes.
Don't place the executable in the local filesystem (/tmp) as each node has its
own /tmp . Files placed into /tmp on the front end node won't be available on
the compute nodes, so mpirun won't be able to start processes on compute nodes.
* One can use the storage on the disk drive on each of the compute nodes by reading
and writing to $TMPDIR. This is temporary storage that can be used only during
the execution of your program. Only processors executing on a node have access
to this disk drive. Since 4 processors share this same storage, you must
include the rank of the executing MPI processes when reading and writing files
to $TMPDIR. The size of $TMPDIR is about 130 GB.
* By default the job will be terminated if no MPI communications have been
detected within 15 minutes. To turn off this feature use `-q 0' mpirun option:
mpirun -q 0 -np 2 ./a.out
For more information read man mpirun (search for -quiescence-timeout option)
* The -e and -o PBS files are not available until PBS jopb finishes, so you
may want to use 'mpirun -np 12 a.out >& output_file' . Then you can see
the output from lightning while the job is running. Alternatevily you can use
qpeek command:
qpeek job# shows STDOUT while job is running.
qpeek -e job# shows STDERR while job is running.
* For convenience an mpirun command can be submitted to the batch queues using
the command bmpirun rather than mpirun. E.g. bmpirun -np 8 ./a.out
Restrictions: no more than 16 processes
no more than 1 hour
output does not appear until after the command is complete.
The command runs immediately if enough nodes are free, otherwise, it waits
in the queue. If you use CNTL-C to exit the command, you also need to
qdel the associated job in the PBS queues.