Slurm Installation

- - sub-child
- child-2-id

Slurm Installation

Initial Setup

First, we need to set up on each node the slurm user and group, with the home directory on the shared drive.

We must create the home directory on the shared drive (this only must be done once, from one node, and can be skipped if this was done already in the previous steps to install munge):

sudo mkdir -p -m777 /clusterfs/var/lib

Then, on each node create the group and user:

export SLURMUSER=1004
sudo groupadd -g $SLURMUSER slurm
sudo useradd -m -c "SLURM" -d /clusterfs/var/lib/slurm -u $SLURMUSER -g slurm -s /sbin/nologin slurm

(Ignore the warnings on all nodes after the first that the directory already exists and that it won’t copy any file from skel directory into it. Also one can remove the “/clusterfs” from the last command if installing slurm through package managers on each individual node.)

The number on the first line should be the same for all nodes (according to a warning on the SLURM installation site), so if a group already exists with that GID or a user already exists with that UID then remove the newly-created ones (with sudo groupdel slurm and sudo userdel slurm) and re-create them with a new number. To avoid this, one can also do grep $SLURMUSER /etc/group and grep $SLURMUSER /etc/passwd (both after the export command in the first block) on each node and ensure there is no output.

Install from Source

Install some additional build depencencies if not already installed (check with dpkg -l <package-name>, and install with sudo apt-get install <package-name>):

libdbus-1-dev
linux-headers-$(uname -r)

Locate the download url for the most recent slurm version from https://www.schedmd.com/download-slurm/.

On the head node, enter the scratch directory, download, and extract the slurm installer file:

cd /clusterfs/scratch
wget https://download.schedmd.com/slurm/slurm-23.11.6.tar.bz2
tar xjf slurm-23.11.6.tar.bz2

Configure and install:

./configure --prefix=/clusterfs/usr/local/slurm --with-munge=/clusterfs/usr --enable-load-env-no-login --with-systemdsystemunitdir=/clusterfs/usr/lib/systemd/system 
make -j$(nproc)
sudo make install

Edit service files:

In /clusterfs/usr/lib/systemd/system/slurmd.service, uncomment the line near the top for ConditionPathExists.

Edit configuration files:

Some examples of necessary configuration files can be found here.

Create the directory to hold them:

sudo mkdir -m777 /clusterfs/usr/local/slurm/etc

Create /clusterfs/usr/local/slurm/etc/slurm.conf (visit this url in a web browser to generate the configuration file):

# slurm.conf file generated by configurator.html.
# Put this file on all nodes of your cluster.
# See the slurm.conf man page for more information.
#
ClusterName=bramble
SlurmctldHost=node00
#SlurmctldHost=
AuthType=auth/munge
CredType=cred/munge
#
#DisableRootJobs=NO
#EnforcePartLimits=NO
#Epilog=/clusterfs/usr/local/slurm/etc/epilog
#EpilogSlurmctld=
#FirstJobId=1
#MaxJobId=67043328
#GresTypes=
#GroupUpdateForce=0
#GroupUpdateTime=600
#JobFileAppend=0
#JobRequeue=1
#JobSubmitPlugins=lua
#KillOnBadExit=0
#LaunchType=launch/slurm
#Licenses=foo*4,bar
#MailProg=/bin/mail
#MaxJobCount=10000
#MaxStepCount=40000
#MaxTasksPerNode=512
#MpiDefault=
#MpiParams=ports=#-#
PluginDir=/clusterfs/usr/local/slurm/lib/slurm:/clusterfs/scratch/slurm-23.11.6/src/plugins
#PlugStackConfig=
#PrivateData=jobs
ProctrackType=proctrack/cgroup
#Prolog=/clusterfs/usr/local/slurm/etc/prolog
#PrologFlags=
#PrologSlurmctld=
#PropagatePrioProcess=0
#PropagateResourceLimits=
#PropagateResourceLimitsExcept=
RebootProgram=/sbin/reboot
ReturnToService=1
SlurmctldPidFile=/clusterfs/var/run/slurmctld.%n.pid
SlurmctldPort=6817
SlurmdPidFile=/clusterfs/var/run/slurmd.%n.pid
SlurmdPort=6818
SlurmdSpoolDir=/clusterfs/var/spool/slurmd.%n
SlurmUser=slurm
#SlurmdUser=root
#SrunEpilog=
#SrunProlog=
StateSaveLocation=/clusterfs/var/spool/slurmctld.state
#SwitchType=
#TaskEpilog=
#TaskPlugin=task/affinity,task/cgroup
#TaskProlog=
#TopologyPlugin=topology/tree
#TmpFS=/tmp
#TrackWCKey=no
TreeWidth=16
#UnkillableStepProgram=
#UsePAM=0
#
#
# TIMERS
#BatchStartTimeout=10
#CompleteWait=0
#EpilogMsgTime=2000
#GetEnvTimeout=2
#HealthCheckInterval=0
#HealthCheckProgram=
InactiveLimit=0
KillWait=30
#MessageTimeout=10
#ResvOverRun=0
MinJobAge=300
#OverTimeLimit=0
SlurmctldTimeout=120
SlurmdTimeout=300
#UnkillableStepTimeout=60
#VSizeFactor=0
Waittime=0
#
#
# SCHEDULING
#DefMemPerCPU=0
#MaxMemPerCPU=0
#SchedulerTimeSlice=30
SchedulerType=sched/backfill
SelectType=select/cons_tres
#
#
# JOB PRIORITY
#PriorityFlags=
#PriorityType=priority/multifactor
#PriorityDecayHalfLife=
#PriorityCalcPeriod=
#PriorityFavorSmall=
#PriorityMaxAge=
#PriorityUsageResetPeriod=
#PriorityWeightAge=
#PriorityWeightFairshare=
#PriorityWeightJobSize=
#PriorityWeightPartition=
#PriorityWeightQOS=
#
#
# LOGGING AND ACCOUNTING
#AccountingStorageEnforce=0
AccountingStorageHost=node00
#AccountingStoragePass=
AccountingStoragePort=6819
AccountingStorageType=accounting_storage/slurmdbd
AccountingStorageUser=slurm
AccountingStoreFlags=job_comment,job_env,job_extra,job_script
#JobCompHost=node00
#JobCompLoc=/clusterfs/log/slurm/jobcompdb
#JobCompParams=
#JobCompPass=
#JobCompPort=
#JobCompType=jobcomp/mysql
#JobCompUser=slurm
#JobContainerType=
JobAcctGatherFrequency=30
#JobAcctGatherTypejobacct_gather/cgroup=
SlurmctldDebug=info
SlurmctldLogFile=/clusterfs/var/log/slurmctld.log
SlurmdDebug=info
SlurmdLogFile=/clusterfs/var/log/slurmd.%h.log
#SlurmSchedLogFile=
#SlurmSchedLogLevel=
#DebugFlags=
#
#
# POWER SAVE SUPPORT FOR IDLE NODES (optional)
#SuspendProgram=
#ResumeProgram=
#SuspendTimeout=
ResumeTimeout=600
#ResumeRate=
#SuspendExcNodes=
#SuspendExcParts=
#SuspendRate=
#SuspendTime=
#
#
# COMPUTE NODES
# can also add in the NodeAddr parameter in the line below as well, if the IP addresses are fixed.
NodeName=node[00-03] CPUs=4 State=UNKNOWN
PartitionName=bramble Nodes=ALL Default=YES MaxTime=INFINITE State=UP
#
# FRONTEND NODES
#AllowGroups=
#AllowUsers=
#DenyGroups=
#DenyUsers=
# FrontEndName=node00 FrontEndAddr=node00 State=UNKNOWN
#Port=
#Reason=

Setup database for slurm:

First, we need to enter the mysql shell in order to create the database user:

mysql -u root -p

Enter the password to access the mysql shell.

Then we can create the user (replace the password with something):

CREATE USER 'slurm'@'localhost' IDENTIFIED BY 'password';

Now grant privaleges:

GRANT ALL PRIVILEGES ON slurm_acct_db.* TO 'slurm'@'localhost' WITH GRANT OPTION;

Now create the database:

create database slurm_acct_db;

Exit the shell with exit;.

Slurmdbd configuration file:

Create /clusterfs/usr/local/slurm/etc/slurmdbd.conf (change the database username and password accordingly):

# slurm.conf file generated by configurator.html.
# Put this file on all nodes of your cluster.
# See the slurm.conf man page for more information.
#
ClusterName=bramble
SlurmctldHost=node00
#SlurmctldHost=
AuthType=auth/munge
CredType=cred/munge
#
#DisableRootJobs=NO
#EnforcePartLimits=NO
#Epilog=/clusterfs/usr/local/slurm/etc/epilog
#EpilogSlurmctld=
#FirstJobId=1
#MaxJobId=67043328
#GresTypes=
#GroupUpdateForce=0
#GroupUpdateTime=600
#JobFileAppend=0
#JobRequeue=1
#JobSubmitPlugins=lua
#KillOnBadExit=0
#LaunchType=launch/slurm
#Licenses=foo*4,bar
#MailProg=/bin/mail
#MaxJobCount=10000
#MaxStepCount=40000
#MaxTasksPerNode=512
#MpiDefault=
#MpiParams=ports=#-#
PluginDir=/clusterfs/usr/local/slurm/lib/slurm:/clusterfs/scratch/slurm-23.11.6/src/plugins
#PlugStackConfig=
#PrivateData=jobs
ProctrackType=proctrack/cgroup
#Prolog=/clusterfs/usr/local/slurm/etc/prolog
#PrologFlags=
#PrologSlurmctld=
#PropagatePrioProcess=0
#PropagateResourceLimits=
#PropagateResourceLimitsExcept=
RebootProgram=/sbin/reboot
ReturnToService=1
SlurmctldPidFile=/clusterfs/var/run/slurmctld.%n.pid
SlurmctldPort=6817
SlurmdPidFile=/clusterfs/var/run/slurmd.%n.pid
SlurmdPort=6818
SlurmdSpoolDir=/clusterfs/var/spool/slurmd.%n
SlurmUser=slurm
#SlurmdUser=root
#SrunEpilog=
#SrunProlog=
StateSaveLocation=/clusterfs/var/spool/slurmctld.state
#SwitchType=
#TaskEpilog=
#TaskPlugin=task/affinity,task/cgroup
#TaskProlog=
#TopologyPlugin=topology/tree
#TmpFS=/tmp
#TrackWCKey=no
TreeWidth=16
#UnkillableStepProgram=
#UsePAM=0
#
#
# TIMERS
#BatchStartTimeout=10
#CompleteWait=0
#EpilogMsgTime=2000
#GetEnvTimeout=2
#HealthCheckInterval=0
#HealthCheckProgram=
InactiveLimit=0
KillWait=30
#MessageTimeout=10
#ResvOverRun=0
MinJobAge=300
#OverTimeLimit=0
SlurmctldTimeout=120
SlurmdTimeout=300
#UnkillableStepTimeout=60
#VSizeFactor=0
Waittime=0
#
#
# SCHEDULING
#DefMemPerCPU=0
#MaxMemPerCPU=0
#SchedulerTimeSlice=30
SchedulerType=sched/backfill
SelectType=select/cons_tres
#
#
# JOB PRIORITY
#PriorityFlags=
#PriorityType=priority/multifactor
#PriorityDecayHalfLife=
#PriorityCalcPeriod=
#PriorityFavorSmall=
#PriorityMaxAge=
#PriorityUsageResetPeriod=
#PriorityWeightAge=
#PriorityWeightFairshare=
#PriorityWeightJobSize=
#PriorityWeightPartition=
#PriorityWeightQOS=
#
#
# LOGGING AND ACCOUNTING
#AccountingStorageEnforce=0
AccountingStorageHost=node00
#AccountingStoragePass=
AccountingStoragePort=6819
AccountingStorageType=accounting_storage/slurmdbd
AccountingStorageUser=slurm
AccountingStoreFlags=job_comment,job_env,job_extra,job_script
#JobCompHost=node00
#JobCompLoc=/clusterfs/log/slurm/jobcompdb
#JobCompParams=
#JobCompPass=
#JobCompPort=
#JobCompType=jobcomp/mysql
#JobCompUser=slurm
#JobContainerType=
JobAcctGatherFrequency=30
#JobAcctGatherTypejobacct_gather/cgroup=
SlurmctldDebug=info
SlurmctldLogFile=/clusterfs/var/log/slurmctld.log
SlurmdDebug=info
SlurmdLogFile=/clusterfs/var/log/slurmd.%h.log
#SlurmSchedLogFile=
#SlurmSchedLogLevel=
#DebugFlags=
#
#
# POWER SAVE SUPPORT FOR IDLE NODES (optional)
#SuspendProgram=
#ResumeProgram=
#SuspendTimeout=
ResumeTimeout=600
#ResumeRate=
#SuspendExcNodes=
#SuspendExcParts=
#SuspendRate=
#SuspendTime=
#
#
# COMPUTE NODES
# can also add in the NodeAddr parameter in the line below as well, if the IP addresses are fixed.
NodeName=node[00-03] CPUs=4 State=UNKNOWN
PartitionName=bramble Nodes=ALL Default=YES MaxTime=INFINITE State=UP
#
# FRONTEND NODES
#AllowGroups=
#AllowUsers=
#DenyGroups=
#DenyUsers=
# FrontEndName=node00 FrontEndAddr=node00 State=UNKNOWN
#Port=
#Reason=

Change the file permissions of this file to 600 and the ownership to the slurm user:

sudo chmod 600 /clusterfs/usr/local/slurm/etc/slurmdbd.conf
sudo chown slurm /clusterfs/usr/local/slurm/etc/slurmdbd.conf

Link to Service files

Link to service files (on the nodes other than the head node, it is sufficient to link only the first of the service files below):

sudo ln -s /clusterfs/usr/lib/systemd/system/slurmd.service /lib/systemd/system/slurmd.service
sudo ln -s /clusterfs/usr/lib/systemd/system/slurmdbd.service /lib/systemd/system/slurmdbd.service
sudo ln -s /clusterfs/usr/lib/systemd/system/slurmctld.service /lib/systemd/system/slurmctld.service
sudo ln -s /clusterfs/usr/lib/systemd/system/sackd.service /lib/systemd/system/sackd.service

Add to PATH

Add /clusterfs/usr/local/slurm/bin/ to PATH (in /etc/profile file, as we did for munge), or add links to all those files into /clusterfs/usr/bin/:

for file in $(ls /clusterfs/usr/local/slurm/bin/); do
    sudo ln -s /clusterfs/usr/local/slurm/bin/${file} /clusterfs/usr/bin/${file}
done

Test

Start the slurm daemons on all nodes (head node needs slurmd, slurmctl, and slurmdbd, all other nodes just need slurmd) with sudo systemctl start <daemon>.

Check that the nodes all appear in the slurm control: sinfo -N -r -l (the -r shows only nodes responsive to slurm, so omit it to also include nodes that are down; omit the -l flag to see only the short description table instead of the long one; ommitting the -N flag allows it to group the responses instead of outputting one line for each node).

If the nodes are all up, the command srun -N4 /bin/hostname should print out each node’s hostname. Run it as a shared user though (like sudo su - $NEWUSERNAME -c "srun -N4 /bin/hostname"), or it will be looking for the home directory of the regular user on the login node (which won’t exist on other nodes). See next section for creating a shared user.

Create Shared Users

Create one or more users on each login node with shared home directories (i.e., on /clusterfs/home/<user>) to be able to run slurm commands:

export NEWUSER=1111
export NEWUSERNAME=newuser
sudo useradd -m -c "Shared user $NEWUSERNAME" -U -d /clusterfs/home/$NEWUSERNAME -u $NEWUSER -s /bin/bash $NEWUSERNAME
sudo passwd $NEWUSERNAME

(Again, ignore warnings on all nodes other than the first about the home directory already existing and not copying files into it.)

Add Services to Startup Script Files

Add the start service command to the scripts files: In systemd service (named /lib/systemd/system/bramblehead.service), add:

[Service]
Type=forking
ExecStart=/scripts/bramble-head.sh
RemainAfterExit=yes

[Install]
#RequiredBy=munge.service
WantedBy=multi-user.target

In daemon-reload script, add:

Additional Notes

To see details about a particular node (including a reason why it may be in a down state), run the following:

scontrol show node <node's hostname>

If you ever have to reboot a node without doing it through slurm, run the following to tell slurm to bring the node back from the “down” state:

sudo /clusterfs/usr/bin/scontrol update state=resume NodeName=<node's hostname>

The proper way to reboot a node is with scontrol reboot_nodes <node's hostname> (or ALL, or a list <NodeList>, and assuming the RebootProgram is set in the slurm.conf).

Appendix

Some interesting outputs from sudo make install:

libtool: finish: PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/sbin" ldconfig -n /clusterfs/usr/local/slurm/lib
----------------------------------------------------------------------
Libraries have been installed in:
   /clusterfs/usr/local/slurm/lib

If you ever happen to want to link against installed libraries
in a given directory, LIBDIR, you must either use libtool, and
specify the full pathname of the library, or use the '-LLIBDIR'
flag during linking and do at least one of the following:
   - add LIBDIR to the 'LD_LIBRARY_PATH' environment variable
     during execution
   - add LIBDIR to the 'LD_RUN_PATH' environment variable
     during linking
   - use the '-Wl,-rpath -Wl,LIBDIR' linker flag
   - have your system administrator add LIBDIR to '/etc/ld.so.conf'

See any operating system documentation about shared libraries for
more information, such as the ld(1) and ld.so(8) manual pages.

libtool: finish: PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/sbin" ldconfig -n /clusterfs/usr/local/slurm/lib/slurm
----------------------------------------------------------------------
Libraries have been installed in:
   /clusterfs/usr/local/slurm/lib/slurm

If you ever happen to want to link against installed libraries
in a given directory, LIBDIR, you must either use libtool, and
specify the full pathname of the library, or use the '-LLIBDIR'
flag during linking and do at least one of the following:
   - add LIBDIR to the 'LD_LIBRARY_PATH' environment variable
     during execution
   - add LIBDIR to the 'LD_RUN_PATH' environment variable
     during linking
   - use the '-Wl,-rpath -Wl,LIBDIR' linker flag
   - have your system administrator add LIBDIR to '/etc/ld.so.conf'

See any operating system documentation about shared libraries for
more information, such as the ld(1) and ld.so(8) manual pages.