lightning: Managing jobs using the PBS job scheduler
How to create, submit, monitor, and delete jobs using the PBS job scheduler
1) Login in to lightning using the ssh command
All PBS jobs submitted on lightning will run on the compute nodes of lightning.
2) Make sure that you have the correct PATH by issuing printenv | grep PATH
on lightning. Your path should include at least one occurrence of /usr/local/bin
If this is not the case, send email to hpc-help@iastate.edu.
3) Create a PBS script using the PBS scriptwriter and save in a file named "myscript",
or whatever name you like. Since the architecture of lightning is different from hpc1,
hpc2, and hpc3, scripts for these machines should not be used on lightning. When selecting
the Time needed and Number of CPU needed refer to the current queue structure by issuing
qstat -q
Since each node contains 4 processors it is recommended that the Number of CPU selected
be a multiple of 4.
4) When specifying mpirun command take into account that by default the job will be terminated
if no MPI communications have been detected within 15 minutes. To turn off this feature use
`-q 0' mpirun option:
mpirun -q 0 -np 2 ./a.out
For more information read man mpirun (search for -quiescence-timeout option)
5) qsub myscript
This command submits the PBS script in the file myjob. You may submit several
jobs in succession if they use different output files. Jobs will be scheduled
for queues based on the resources requested. Job queues limit the number of
simultaneous jobs by a single user and a single group.
PBS scripts must be "visible" to the qsub command, i.e. one must either issue "qsub myscript" from
the directory where "myscript" is located or issue "qsub /myscript".
Usually people keep PBS scripts in the same directory where the executable and the initial data are
located. In this case the easiest would be to submit job from that directory and within the script to cd
there by issuing "cd $PBS_O_WORKDIR".
6) qstat -q
Gives the status of all the queues and the current queue structure.
7) qstat -a
This shows all jobs in the system that are running and waiting in a queue to run.
If you don't see your job, it has completed execution. The script generated by
the PBS scriptwriter puts output in file BATCH_OUTPUT and the error file
in BATCH_ERRORS. For more information, read the documentation produced by the
PBS scriptwriter.
8) qstat -r
This shows all jobs that are currently running.
9) qdel job#
If your PBS job has job# 15244, then issuing qdel 15244 will delete the job from the PBS job queue and
if the job is running, its execution will be terminated.