Most of modern desktop workstations come with quite nice configurations: multicores, hyperthreading… To take the best of such configurations, it is very convenient to set them up as a grid. This will enable to queue tasks, a feature most convenient to empower your multitasking skills 🙂
There is good old Sun Grid Engine (SGE), and its ‘qsub’ command. Now that Sun got bought by Oracle, and that Oracle got rid of SGE, the market of task scheduling got a bit confused:
- The last version of SGE is still around. Packages for Debian and Ubuntu are still getting dust in the repository. For Ubuntu at least, some additional fonts were needed to get the thing working. In the latest vintage (14.04), this trick does not even seem to work any longer. The command line version still works however.
- A fork of the latest OpenSource version of SGE can be found on SourceForge (Open Grid Scheduler. It has to be compiled from source however, at task that is far from being straightforward due to the numerous dependencies.
- Another fork named as Son of Grid Engine (!) is available from the university of Liverpool. ‘deb’ and ‘rpm’ packages are provided.
As an alternative, I will here blog on Slurm. Slurm stands for Simple Linux Utility for Resource Management. Slurm is famous enough so that many posts are already dedicated to it on the blogosphere. Here is a short wrap up:
Packages for ubuntu an debian are available:
sudo apt-get install slurm-llnl
You will also need the munge software, also available in the repository:
sudo apt-get install munge
Generating the configuration file
The installation comes with a couple of HTML pages allowing to generate the configuration file.
They can be found at:
Just open one of them in your webbrowser, and start to fill in the required fields. Default options are provided. To get information about your particular machine, you can run
This should come handy for the last part of the option file. Then save the resulting file in
Generate munge key
This is done with the command
For some reason, a permission change is needed to avoid some later warnings:
sudo chmod g-w /var/log sudo chmod g-w /var/log/munge
With the commands:
/etc/init.d/slurm-llnl start /etc/init.d/munge start
Multi-core, multi-node and multi-prog
There we are… now come the main topic of this post. Slurm as a major limitation: as opposed to SGE, it is not meant to be run on a single machine. One machine, be it with several cores, will be considered as a single node, as it will run one instance of the slurmd daemon (Note: there seems to exist a mode to circumvent this, which needs to be enabled at compilation time… maybe the topic of a later post).
Fortunately, Slurm has a “multi-prog” mode, allowing to launch several programs in one run. I here illustrate how it can be used to mimic the behavior of several node son a single machine.
In this example, one would like to run 2000 independent analyses, typically, one program execution on 2000 data sets corresponding to 2000 genes for instance. This is conveniently achieved using a job array (just like with SGE). In Slurm, job arrays are set up using the line:
With only one node, the array is going to execute only one job after the other, not making use of multiple cores. If we have 20 cores available, we can decide to run 20 genes simultaneously. This is achieved using the multi-prog option, via a special configuration file listing the 20 programs to run. The trick is then to have the job array generate this file for you:
#!/bin/bash # file slurm_example.sh # run with sbatch -o slurm-%A_%a.out slurm_example.sh #SBATCH --job-name=slurm_example #SBATCH --output=slurm_example.txt #SBATCH --array=1-2000:20 #SBATCH --ntasks=20 #Create file: rm multi.conf for ((i=0; i < 20; i++)); do #Get gene name: GENE=`sed "$((SLURM_ARRAY_TASK_ID + i))q;d" genes.txt` echo $GENE echo "$i myprog $GENE" >> multi.conf done srun --multi-prog multi.conf
A few clarifications:
- The file multi.conf is generated on-the-go and contains the 20 current execution lines
- Note the syntax of the job array, with a step of 20
- The –ntasks=20 is required to say we will run 20 tasks simultaneously
- In this example, I assume that all 20 gene names are stored in a file named “genes.txt”. The nth gene is retrieved using a sed command.
This small trick will allow you to make a good use of your 20 (or more) cores. Yet there are limitations:
- The next batch of 20 tasks will only be started once the current 20 are finished. This may be a serious limitation in case of unbalanced tasks, with some taking much more time than others (genes of different sizes for instance).
- It is not possible to launch an analysis using 10 cores, and another using 10 other cores simultaneously.