Chapter 16 Slurm and cluster computing on ROAR

by Jess Grembi

Many of the bioinformatic tools that we use require a large amount of RAM, rely on large databases, and can be run in parallel across many samples. For these types of analyses, we use ROAR Collab, Penn State’s computing cluster. ROAR uses Slurm, an open source, scalable cluster management and job scheduling system for computing clusters. You will need to request an account, and once you have one Jess can add you to the lab account to access our storage areas and compute resources. Please refer to the ROAR User Guide to learn about the system and how to use it. Below, we include a few tips specific to how we use ROAR Collab in our lab.

16.1 Getting started

To access ROAR Collab, in terminal, log in using the following syntax and replace , with your PSU alias. You will be prompted to enter your PSU password (the same one you use for your email and other accounts) and to complete two-factor authentication.

ssh <user>@submit.hpc.psu.edu

Once you log in, you can view the contents of your home directory in command line by entering cd $HOME. You can create subfolders within this directory using the mkdir command. For example, you could make a “code” subdirectory and clone a Github repository there using the following code:

cd $HOME
mkdir code
git clone https://github.com/jadebc/covid19-infections.git

16.1.1 Storage

Each user account is provided with 3 different storage assets. The $HOME directory is a good place to store code and small test files (quota: 16 GB per user) and the $WORK directory is a good place to store larger items that you will use regularly (quota: 128 GB per user). Save intermediate processing files and temporary files generated by specific bioinformatic tools to the $SCRATCH directory (quota: unlimited, but files are automatically deleted 30 days after their last modification). You can read more about storage options on ROAR here.

The lab has also purchased active storage space (45 TiB, accessible at /storage/group/jag548/default) and archive storage (15 TiB, accessible at /storage/archive/jag548/default). Archive storage is about 1/4 the cost of active storage so raw sequencing files should be moved to the archive storage once you’ve completed the preliminary data processing steps and are fairly certain you will not be needing them again anytime soon.

You can check the use of your personal storage and the group resources easily by running:

check_quota

16.1.2 Submitting jobs

ROAR has various CPU and GPU compute resources. On the basic cluster, all users have access to the following resources (across all jobs they have submitted):

48h runtime
800 GB
100 cores

There are 3 different hardware partitions, and you need to specify which you want your jobs to run on:

basic cores: 4GB/core
standard cores: 8GB/core
high memory cores: 20GB/core

The task/job being run should dictate which partition you should use.

Your credit account ” jag548_cr_credit” is active. To use the account, you will need to specify it in your job submission by either selecting it via the drop down options on the portal, or by using the directive “–account= jag548_cr_credit”. Additionally, you will also need to specify the desired hardware partition for your job to launch correctly. The options are “basic”, “standard”, and “himem”.

16.1.3 One-Time System Set-Up

To keep the install packages consistent across different nodes, you will need to explicitly set the pathway to your R library directory.

Open your ~/.Renviron file (vi ~/.Renviron) and append the following line:

Note: Once you open the file using vi [file_name], you must press i (on Mac OS) or Insert (on Windows) to make edits. After you finish, hit Esc to exit editing mode and type :wq to save and close the file.

R_LIBS=~/R/x86_64-pc-linux-gnu-library/4.0.2

Alternatively, run an R script with the following code on ROAR Collab:

r_environ_file_path = file.path(Sys.getenv("HOME"), ".Renviron")
if (!file.exists(r_environ_file_path)) file.create(r_environ_file_path)

cat("\nR_LIBS=~/R/x86_64-pc-linux-gnu-library/4.0.2",
    file = r_environ_file_path, sep = "\n", append = TRUE)

To load packages that run off of C++, you’ll need to set the correct compiler options in your R environment.

Open the Makevars file in ROAR Collab (vi ~/.R/Makevars) and append the following lines

CXX14FLAGS=-O3 -march=native -mtune=native -fPIC
CXX14=g++

Alternatively, create an R script with the following code, and run it on ROAR Collab:

dotR = file.path(Sys.getenv("HOME"), ".R")
if (!file.exists(dotR)) dir.create(dotR)

M = file.path(dotR, "Makevars")
if (!file.exists(M)) file.create(M)

cat("\nCXX14FLAGS=-O3 -march=native -mtune=native -fPIC",
    "CXX14=g++",
    file = M, sep = "\n", append = TRUE)

16.2 Moving files to ROAR Collab

For this reason, it’s best to create a bash script that records the file transfer process for a given project. See example code below:

# note: the following steps should be done from your local 
# (not after ssh-ing into ROAR Collab)

# securely transfer folders from Box to ROAR Collab work directory
# note: the -r option is for folders and is not needed for files
scp -r "Box/malaria-project/folder-1/" USERNAME@submit.hpc.psu.edu:/work/

# securely transfer folders from Box to your ROAR Collab scratch directory
scp -r "Box/malaria-project/folder-2/" USERNAME@submit.hpc.psu.edu:/scratch/

# securely transfer folders from Box to our shared storage directory
scp -r "Box/malaria-project/folder-3/" USERNAME@submit.hpc.psu.edu:/storage/group/jag548/

16.3 Installing packages on Sherlock

When you begin working on Sherlock, you will most likely encounter problems with installing packages. To install packages, login to Sherlock on the command line and open a development node using the command sdev. Do not attempt to do this in the RStudio Server (see next section), as you will have to re-do it for every new session you open.

ssh USERNAME@login.sherlock.stanford.edu

sdev

There is a package installation file explicitly written for Sherlock that you should run before testing any code and sourcing the configuration file. You should only have to install packages once. Sherlock requires that you specify the repository where the package is downloaded from. You may also need to add an additional argument to install.packages to prevent the packages from locking after installation:

install.packages(<PACKAGE NAME>, repos=“http://cran.us.ur-project.org”, 
                  INSTALL_opts = "--no-lock"")

In order for some R packages to work on Sherlock, it is necessary to load specific software modules before running R. These must be loaded in Sherlock each time you want to use the package in R. For example, for spatial and random effects analyses, you may need the modules/packages below. These modules must also be loaded on the command line prior to opening R in order for package installation to work.

module --force purge # remove any previously loaded modules, including math and devel
module load math
module load math gmp/6.1.2
module load devel
module load gcc/10
module load system
module load json-glib/1.4.4
module load curl/7.81.0
module load physics
module load physics udunits geos
module load physics gdal/2.2.1 # for R/4.0.2
module load physics proj/4.9.3 # for R/4.0.2
module load pandoc/2.7.3

module load R/4.0.2

R # Open R in the Shell window to install individual packages or test code
Rscript install-packages-sherlock.R # Alternatively, run the entire package installation script in the Shell window

Figuring out the issues with some packages will require some trial and error. If you are still encountering problems installing a package, you may have to install other dependencies manually by reading through the error messages. If you try to install a dependency from CRAN and it isn’t working, it may be a module. You can search for it using the module spider command:

module spider DEPENDENCY NAME

However, you can also reach out to the Sherlock team for help. You can email them at srcc-support@stanford.edu. They also hold office hours.

16.4 Testing your code

Both of the following ways to test code on Sherlock are recommended for making small changes, such as editing file paths and making sure the packages and source files load. You should write and test the functionality of your script locally, only testing on Sherlock once major bugs are out.

16.4.1 The command line

There are two main ways to explore and test code on Sherlock. The first way is best for users who are comfortable working on the command line and editing code in base R. Even if you are not comfortable yet, this is probably the better way because these commands will transfer between Sherlock and other cluster computers using Slurm.

Typically, you will want to initially test your scripts by initiating a development node using the command sdev. This will allocate a small amount of computing resources for 1 hour. You can access R via command line using the following code.

# open development node
sdev

# Load all the modules required by the packages you are using
module load MODULE NAME  

# Load R (default version)*
module load R 

# initiate R in command line
R

*Note: for collaboration purposes, it’s best for everyone to work with one version of R. Check what version is being used for the project you are working on. Some packages only work with some versions of R, so it’s best to keep it consistent.

16.4.2 The ROAR Collab OnDemand Dashboard

The second way to test and edit code is to use the ROAR Collab OnDemand Dashboard, accessed by typing login.sherlock.stanford.edu into a web browser. You will be prompted to authenticate the way you would for any Stanford website. This is the best way to edit code for people who are not comfortable accessing & editing in base R in a Shell application.

You can test your code via the Rstudio server on Sherlock. To access this, login to the Dashboard, then click on Interactive Apps in the menu bar and choose R Studio Server. Similar to the sdev node, you have to set various parameters for your session. Choose a version of R and set the time – max. 2 hours. You can play with the other configurations, but this is likely unnecessary, as you should not need huge computing power to test small amounts of code. Keep in mind the more computing power you request, the lower priority your request becomes. You will then wait for the resources to become available, and you will be able to click “Launch” when they are (if you don’t mess with the CPU or GPU, this is usually less than 2 minutes). The screen that opens will look very similar to the RStudio on your local.

Do NOT use the RStudio Server’s Terminal to install packages, set your R environment, and do everything else needed to configure Sherlock because you will likely need to re-do it for every session/project. It’s best to use the Dashboard/RStudio Server if you are more comfortable testing & editing in RStudio rather than through base R in a Shell application.

16.4.3 Filepaths & configuration on ROAR Collab

In most cases, you will want to test that the file paths work correctly on ROAR Collab You will likely need to add code to the configuration file in the project repository that specifies Sherlock-specific file paths. Here is an example:

# set sherlock-specific file paths
if(Sys.getenv("LMOD_SYSHOST")=="sherlock"){
  
  sherlock_path = paste0(Sys.getenv("HOME"), "/malaria-project/")
  
  data_path = paste0(sherlock_path, "data/")
  results_path = paste0(sherlock_path, "results/")
}

16.5 Storage & group storage access

16.5.1 Individual storage

There are multiple places to store your files on Sherlock. Each user has their own $HOME directory as well as a $SCRATCH directory. These are directories that can be accessed via the command line once you’ve logged in to Sherlock:

cd $HOME 
cd /home/users/USERNAME # Alternatively, use the full path

cd $SCRATCH
cd /scratch/users/USERNAME # Full path

You can also navigate to these using the File Explorer on Sherlock OnDemand.

$HOME has a volume quota of 15 GB. $SCRATCH has a volume quota of 100 TB, but files here get deleted 90 days after their last modification. Thus, use $SCRACTH for test files, exploratory analyses, and temporary storage. Use $HOME for long-term storage of important files and more finalized analyses.

You can read more about storage options on Sherlock here.

16.5.2 Group storage

The lab also has a $GROUP_HOME and a $GROUP_SCRATCH to store files for collaborative use. $GROUP_HOME has a volume quota of 1 TB and infinite retention time, whereas $GROUP_SCRATCH has a volume quota of 100 TB and the same 90-day retention limit. You can access these via the command line or navigate to them using the File Explorer:

cd $GROUP_HOME
cd /home/groups/jadebc

cd $GROUP_SCRATCH
cd /scratch/groups/jadebc

However, saving files to group storage can be tricky. You can try using the scp command in the section “Moving files to Sherlock” to see if you have permission to add files to group directories. Read the next section to ensure any directories you create have the right permissions.

16.5.3 Folder permissions

Generally, when we put folders in $GROUP_HOME or $GROUP_SCRATCH, it is so that we can collaborate on an analysis within the research group, so multiple people need to be able to access the folders. If you create a new folder in $GROUP_HOME or $GROUP_SCRATCH, please check the folder’s permissions to ensure that other group members are able to access its contents. To check the permissions of a folder, navigate to the level above it, and enter ls -l. You will see output like this:

drwxrwxrwx 2 jadebc jadebc  2204 Jun 17 13:12 myfolder

Please review this website to learn how to interpret the code on the left side of this output. The website also tells you how to change folder permissions. In order to ensure that all users and group members are able to access a folder’s contents, you can use the following command:

chmod ugo+rwx FOLDER_NAME

16.6 Running big jobs

Once your test scripts run successfully, you can submit an sbatch script for larger jobs. These are text files with a .sh suffix. Use a text editor like Sublime to create such a script. Documentation on sbatch options is available here. Here is an example of an sbatch script with the following options:

job-name=run_inc: Job name that will show up in the Sherlock system
begin=now: Requests to start the job as soon as the requested resources are available
dependency=singleton: Jobs can begin after all previously launched jobs with the same name and user have ended.
mail-type=ALL: Receive all types of email notification (e.g., when job starts, fails, ends)
cpus-per-task=16: Request 16 processors per task. The default is one processor per task.
mem=64G: Request 64 GB memory per node.
output=00-run_inc_log.out: Create a log file called 00-run_inc_log.out that contains information about the Slurm session
time=47:59:00: Set maximum run time to 47 hours and 59 minutes. If you don’t include this option, Sherlock will automatically exit scripts after 2 hours of run time.

The file analysis.out will contain the log file for the R script analysis.R.

#!/bin/bash

#SBATCH --job-name=run_inc
#SBATCH --begin=now
#SBATCH --dependency=singleton
#SBATCH --mail-type=ALL
#SBATCH --cpus-per-task=16
#SBATCH --mem=64G
#SBATCH --mem=64G
#SBATCH --output=00-run_inc_log.out
#SBATCH --time=47:59:00

cd $HOME/malaria-code-repo/2-analysis/

module purge 

# load R version 4.0.2 (required for certain packages)
module load R/4.0.2

# load gcc, a C++ compiler (required for certain packages)
module load gcc/10

# load software required for spatial analyses in R
module load physics gdal
module load physics proj

R CMD BATCH --no-save analysis.R analysis.out

To submit this job, save the code in the chunk above in a script called myjob.sh and then enter the following command into terminal:

sbatch myjob.sh

To check on the status of your job, enter the following code into terminal:

squeue -u $USERNAME