hpc-class: Using MPI


Parallelism beyond 2 CPUs on hpc-class is obtained by using MPI. There are two ways to use MPI :


Interactive:
------------
* compile using mpif77, mpif90, mpicc or mpiCC. All mpi libraries and
     include files are included in these compile scripts.  The executable
     must reside on the fileserver (where your home directory is) rather
     than in a local filesystem like /tmp.
* Issue the command: mpirun -np 4 ./a.out
    When running interactively, mpirun will run only on the 4 interactive
    nodes (the front-end and three special compute nodes).  

Note: When running interactively, you will share these 4 nodes (8 CPUs) with 
    any other users, and any front-end work, so you may not see any increase 
    in performance when using multiple MPI processes.  Use mpirun on the 
    interactive nodes when you 
    1) need to run short tests while developing the program, or
    2) need to run interactively for debugging purposes.

    Also, if several people are using mpirun interactively, the Myrinet GM
ports may be consumed.  This results in the error:
   Error: Unable to open a GM port !
being printed.

If this happens, either use the batch instructions below, or 
wait until later.

Alternative:  A new script bmpirun is available which avoids the "Unable to 
	    open a GM port! " problem above.  Use bmpirun just like mpirun
	    when running interactively, and the job will be submitted to 
	    batch nodes instead of the interactive nodes, and it will run
	    dedicated on those nodes. When the job finishes, the output will
	    be printed to your screen. bmpirun is limited to 15 minutes wall 
            time and to -np 32, though it only runs on at most 4 nodes.
E.g.  instead of  mpirun -np 4 a.out  ,
      use        bmpirun -np 4 a.out .

            Don't use bmpirun for timing or performance runs above -np 8 .
            Run with np greater than 8 run the processes at less than 100%
            so this is not useful for performance runs, but can be useful 
            for debugging or for testing when a large number of processors
            is not available.

Batch:
------
* compile using mpif77, mpif90, mpicc or mpiCC. All mpi libraries and
     include files are included in these compile scripts.  The executable
     must reside on the fileserver (where your home directory is) rather
     than in a local filesystem like /tmp.
* use the PBS Script Writer web form to write a script to submit to
     the batch scheduler. Remember that there are 2 processors per node, 
     so for a 12 processor job, you only need 6 nodes.
* In the script use the command mpirun -np 12 ./a.out
    mpirun has been modified to use only the nodes that PBS assigns to it,
    so no one else will use the nodes on which your MPI job is running.
    Only one mpirun may appear, as PBS will not allow a second mpirun to start.
* If you need local temporary space during the job, use the 65 GB space $TMPDIR 
* The -e and -o PBS files are not available until PBS job finishes, so you
    may want to use mpirun -np 12 a.out >& output_file .  Then you can see
    the output from hpc-class while the job is running.

Note: The batch system will reserve the number of nodes that you request.
    Your job will wait until all the requested nodes are available
    and then your job will start, and only your job will run on those nodes.
    You can think of this like a reservation system in a restaurant.
    A large group reservation may wait a long time while small group
    reservation go ahead, however, once any group is seated, no other 
    patrons share the table.
    These batch system reservations are made in a larger separate pool 
    of 40 nodes.  Use mpirun in the batch system when you 
    1) need repeatable performance (say for timing purposes), or 
    2) need more resources than the interactive nodes provide,

All accounts are set up to use MPI with the fast, low latency MYRINET communication interface. MPI over Ethernet is also provided, but you need to alter your PATH variable so that /shared/mpich-1.2.5-chp4/bin appears either before or instead of /shared/MYRINET/bin. This means that mpif90, mpif77, mpicc , mpiCC and mpirun will now refer to the Ethernet version of MPI. All executables created for MYRINET will need to be recompiled before they can be run with this mpirun. The instructions are the same for using the Ethernet version as for the MYRINET version of MPI.