lightning: Using MPI


Parallelism on lightning is obtained by using MPI. All accounts are set up so that MPI uses the very high performance InfiniPath HTX communication network. To use MPI :

* To compile Fortran 77, Fortran 90, and Fortran 95 MPI programs use either 
     mpif77, mpif90 or mpif95. To compile C and C++ MPI programs use either 
     mpicc or mpiCC. Note that Pathscale combines their C and C++ compilers 
     into one compiler. Pathscale also combines Fortran 77, 90, and 95 compilers 
     into one compiler.
* use http://andrew.ait.iastate.edu/HPC/lightning/lightning_script_writer.html to write
     a script to submit to the batch scheduler. Remember that there
     are 4 processors per node, so for a 12 processor job, you only
     need 3 nodes.
* In the script use mpirun -np 12 ./a.out
     mpirun has been modified to use only the nodes that PBS assigns to it,
     so no one else will use the nodes on which your MPI job is running.
     Only one mpirun may appear, as PBS will not allow a second mpirun to start.
* Make sure that the executable (a.out in the example above) resides in one of 
     the following locations:
       /home/user     (where 'user' is your user name)
       /work/group    (where 'group' is your group name, issue 'groups' to find it out)
       /ptmp
     All these locations are mounted on each of the compute nodes.
     Don't place the executable in the local filesystem (/tmp) as each node has its 
     own /tmp . Files placed into /tmp on the front end node won't be available on 
     the compute nodes, so mpirun won't be able to start processes on compute nodes.
* One can use the storage on the disk drive on each of the compute nodes by reading 
     and writing to $TMPDIR.  This is temporary storage that can be used only during 
     the execution of your program. Only processors executing on a node have access 
     to this disk drive.  Since 4 processors share this same storage, you must 
     include the rank of the executing MPI processes when reading and writing files 
     to $TMPDIR. The size of $TMPDIR is about 130 GB.
* By default the job will be terminated if no MPI communications have been
  detected within 15 minutes. To turn off this feature use `-q 0' mpirun option:
       mpirun -q 0 -np 2 ./a.out
  For more information read man mpirun (search for -quiescence-timeout option)
* The -e and -o PBS files are not available until PBS jopb finishes, so you
     may want to use 'mpirun -np 12 a.out >& output_file' .  Then you can see
     the output from lightning while the job is running. Alternatevily you can use 
     qpeek command:
       qpeek    job# shows STDOUT while job is running.
       qpeek -e job# shows STDERR while job is running.
* For convenience an mpirun command can be submitted to the batch queues using
  the command bmpirun rather than mpirun.  E.g. bmpirun -np 8 ./a.out
  Restrictions:  no more than 16 processes
                 no more than 1 hour
                 output does not appear until after the command is complete.
  The command runs immediately if enough nodes are free, otherwise, it waits
  in the queue.  If you use CNTL-C to exit the command, you also need to 
  qdel the associated job in the PBS queues.