Shells MPI

Jump to: navigation, search

MPI is pretty cool but nobody used it and now it's broken too. If you do want to use it let us know.

MPI jobs

The server supports submitting MPI jobs to backend worker nodes with a maximum (and default) of 4 worker nodes available to regular users.

Mpiqueue command

The mpiqueue command can be used to enqueue your mpi job to be run on the worker nodes. Use the --help argument to see available options or the --interactive option to start an interactive session to help you submit your job. Take care to submit a binary that was written for mpi, using the mpi libraries and compiled using one of the mpi compilers. (mpicc, mpic++, mpif77 or mpif90).

Mpirun command

The mpirun command can be used to test mpi programs locally, or to submit an mpi job to your own mpi worker node(s). Though it is possible to run mpi jobs on the ssh server node it is not a heavy compute node. Please only run jobs shortly and for testing purposes, not for lengthy computations.


For the purposes of this example we'll be looking at a C program that just passes some messages between mpi nodes.


 #include <stdio.h>
 #include <stdlib.h>
 #include <mpi.h>
 int main(int argc, char *argv[]) {
   const int MASTER = 0;
   const int TAG_GENERAL = 1;
   int numTasks;
   int rank;
   int source;
   int dest;
   int rc;
   int count;
   int dataWaitingFlag;
   char inMsg;
   char outMsg;
   MPI_Status Stat;
   // Initialize the MPI stack and pass 'argc' and 'argv' to each slave node
   // Gets number of tasks/processes that this program is running on
   MPI_Comm_size(MPI_COMM_WORLD, &numTasks);
   // Gets the rank (process/task number) that this program is running on
   MPI_Comm_rank(MPI_COMM_WORLD, &rank);
   // If the master node
   if (rank == MASTER) {
     // Send out messages to all the sub-processes
     for (dest = 1; dest < numTasks; dest++) {
       outMsg = rand() % 256;      // Generate random message to send to slave nodes
       // Send a message to the destination
       rc = MPI_Send(&outMsg, 1, MPI_CHAR, dest, TAG_GENERAL, MPI_COMM_WORLD);
       printf("Task %d: Sent message %d to task %d with tag %d\n",
              rank, outMsg, dest, TAG_GENERAL);
   // Else a slave node
   else  {
     // Wait until a message is there to be received
     do {
       MPI_Iprobe(MASTER, 1, MPI_COMM_WORLD, &dataWaitingFlag, MPI_STATUS_IGNORE);
     } while (!dataWaitingFlag);
     // Get the message and put it in 'inMsg'
     rc = MPI_Recv(&inMsg, 1, MPI_CHAR, MASTER, TAG_GENERAL, MPI_COMM_WORLD, &Stat);
     // Get how big the message is and put it in 'count'
     rc = MPI_Get_count(&Stat, MPI_CHAR, &count);
     printf("Task %d: Received %d char(s) (%d) from task %d with tag %d \n",
             rank, count, inMsg, Stat.MPI_SOURCE, Stat.MPI_TAG);

Preparing, compiling and running your program

First create a directory for your project, we'll just name this one testrun.
 mkdir testrun
 cd testrun
Now copy and paste the above example into a file (or write your own!) by using your editor of choice. We'll name our file mpi_test.c.
We will compile our program with the mpi c compiler.
 mpicc mpi_test.c -o mpi_test
Now we have our binary file ready to submit to the mpi queue.
If you want to test if your mpi binary is working before you submit it, use the mpirun command. We'll start our binary with two local workers for testing.
 mpirun -n 2 mpi_test
If all is working as it should you will see something like this:
 Task 0: Sent message 103 to task 1 with tag 1
 Task 1: Received 1 char(s) (103) from task 0 with tag 1
Before we can submit our binary to the mpiqueue we need to make sure it's somewhere the user that runs mpi jobs can access it. We'll make a copy of our directory in /tmp and make it world readable.
 cp -R ../testrun /tmp/
 chmod +r -R /tmp/testrun
Now we're ready to submit it to the mpiqueue. We'll have to tell it which directory contains the files for our mpi job (-d argument), and which file to execute (-e argument).
For illustrative purposes we'll aslo be telling it to use 4 worker nodes even though this is the default.
 mpiqueue -d /tmp/testrun -e mpi_test -n 4
You should now see something like this:
 Working directory: /tmp/testrun/
 Executable name:   mpi_test
 MPI workers:       4
 Queuing job.
 Cleaning environment.
 Copying files.
 Starting mpi workers.
 Task 0: Sent message 103 to task 1 with tag 1
 Task 0: Sent message -58 to task 2 with tag 1
 Task 1: Received 1 char(s) (103) from task 0 with tag 1 
 Task 2: Received 1 char(s) (-58) from task 0 with tag 1 
 Task 0: Sent message 105 to task 3 with tag 1
 Task 3: Received 1 char(s) (105) from task 0 with tag 1 
 Mpi run has ended.
GREAT SUCCESS! It seems our task is running and the worker nodes are communicating.
Don't forget to delete your job from it's world readable place when you've finised.
 rm -rf /tmp/testrun
You can delete the directory as soon as your job starts running as the mpiqueue script will automatically send a copy to all the worker nodes.
If another job is running while you submit your job be sure to wait until your job has started before deleting.