From e8d884a627b5d218a579283927497e3eeab9db07 Mon Sep 17 00:00:00 2001 From: Rodrigo Arias Mallo Date: Wed, 7 Oct 2020 18:34:08 +0200 Subject: [PATCH] Document the execution pipeline --- garlic/doc/Makefile | 9 ++ garlic/doc/execution.ms | 203 ++++++++++++++++++++++++++++++++++++++++ 2 files changed, 212 insertions(+) create mode 100644 garlic/doc/Makefile create mode 100644 garlic/doc/execution.ms diff --git a/garlic/doc/Makefile b/garlic/doc/Makefile new file mode 100644 index 0000000..f768139 --- /dev/null +++ b/garlic/doc/Makefile @@ -0,0 +1,9 @@ +all: execution.pdf execution.txt + +%.pdf: %.ms + groff -ms -tbl -Tpdf $^ > $@ + #pdfms $^ 2>&1 >$@ | sed 's/^troff: //g' + killall -HUP mupdf + +%.txt: %.ms + groff -ms -tbl -Tutf8 $^ > $@ diff --git a/garlic/doc/execution.ms b/garlic/doc/execution.ms new file mode 100644 index 0000000..b38f81c --- /dev/null +++ b/garlic/doc/execution.ms @@ -0,0 +1,203 @@ +.TL +Garlic execution +.AU +Rodrigo Arias Mallo +.AI +Barcelona Supercomputing Center +.AB +.LP +This document covers the execution of experiments in the Garlic +benchmark, which are performed under strict conditions. The several +stages of the execution are documented so the experimenter can have a +global overview of how the benchmark runs under the hood. +During the execution of the experiments, the results are +stored in a file which will be used in posterior processing steps. +.AE +.\"##################################################################### +.nr GROWPS 3 +.nr PSINCR 1.5p +.\".nr PD 0.5m +.nr PI 2m +\".2C +.\"##################################################################### +.NH 1 +Introduction +.LP +Every experiment in the Garlic +benchmark is controled by one +.I nix +file. +An experiment consists of several shell scripts which are executed +sequentially and perform several tasks to setup the +.I "execution environment" , +which finally launch the actual program that is being analyzed. +The scripts that prepare the environment and the program itself are +called the +.I stages +of the execution, which altogether form the +.I "execution pipeline" +or simply the +.I pipeline . +The experimenter must know with very good details all the stages +involved in the pipeline, as they can affect with great impact the +result of the execution. +.PP +The experiments have a very strong dependency on the cluster where they +run, as the results will be heavily affected. The software used for the +benchmark is carefully configured for the hardware used in the +execution. In particular, the experiments are designed to run in +MareNostrum 4 cluster with the SLURM workload manager. In the future we +plan to add support for other clusters, in order to execute the +experiments in other machines. +.\"##################################################################### +.NH 1 +Isolation +.LP +The benchmark is designed so that both the compilation of every software +package and the execution of the experiment is performed under strict +conditions. Therefore, we can provide a guarantee that two executions +of the same experiment are actually running the same program in the same +environment. +.PP +All the software used by an experiment is included in the +.I "nix store" +which is, by convention, located in the +.CW /nix +directory. Unfortunately, it is common for libraries to try to load +software from other paths like +.CW /usr +or +.CW /lib . +It is also common that configuration files are loaded from +.CW /etc +and from the home directory of the user that runs the experiment. +Additionally, some environment variables are recognized by the libraries +used in the experiment, which change their behavior. As we cannot +control the software and configuration files in those directories, we +coudn't guarantee that the execution behaves as intended. +.PP +In order to avoid this problem, we create a secure +.I sandbox +where only the files in the nix store are available (with some other +exceptions). Therefore, even if the libraries try to access any path +outside the nix store, they will find that the files are not there +anymore. +.\"##################################################################### +.NH 1 +Execution stages +.LP +There are several predefined stages which form the +.I standard +execution pipeline. The standard pipeline is divided in two main parts: +1) connecting to the target machine and submiting a job to SLURM, and 2) +executing the job itself. +.NH 2 +Job submission +.LP +Three stages are involved in the job submision. The +.I trebuchet +stage connects via +.I ssh +to the target machine and executes the next stage there. Once in the +target machine, the +.I isolate +stage is executed to enter the sandbox. Finally, the +.I sbatch +stage runs the +.I sbatch(1) +program with a job script with simply executes the next stage. The +sbatch program reads the +.CW /etc/slurm/slurm.conf +file from outside the sandbox, so we must explicitly allow this file to +be available as well as the +.I munge +socket, used for authentication. +.PP +The rationale behind running sbatch from the sandbox is that the options +provided in enviroment variables override the options from the job +script. Therefore, we avoid this problem by running sbatch from the +sandbox, where potentially dangerous environment variables were removed. +.NH 2 +Seting up the environment +.LP +Once the job has been selected for execution, the SLURM daemon allocates +the resources and then selects one of the nodes to run the job script +(is not executed in parallel). Additionally, the job script is executed +from a child process, forked from on of the SLURM processes, which is +outside the sandbox. Therefore, we first run the +.I isolate +stage +to enter the sandbox again. +.PP +The next stage is called +.I control +and determines if enough data has been generated by the experiment or if +it should continue repeating the execution. At the current time, is only +implemented as a simple loop that runs the next stage a fixed amount of +times. +.PP +The following stage is +.I srun +which usually launches several copies of the next stage to run in +parallel (when using more than one task). Runs one copy per task, +effectively creating one process per task. The set of CPUs available to +each process is computed by the parameter +.I --cpu-bind +and is crucial to set it correctly; is documented in the +.I srun(1) +manual. Apending the +.I verbose +value to the cpu bind option causes srun to print the assigned affinity +of each task so that it can be reviewed in the execution log. +.PP +The mechanism by which srun executes multiple processes is the same used +by sbatch, it forks from a SLURM daemon running in the computing nodes. +Therefore, the execution begins outside the sandbox. The next stage is +.I isolate +which enters again the sandbox in every task (from now on, all stages +are running in parallel). +.PP +At this point in the execution, we are ready to run the actual program +that is the matter of the experiment. Usually, the programs require some +argument options to be passed in the command line. The +.I argv +stage sets the arguments and optionally some environment variables and +executes the last stage, the +.I program . +.NH 2 +Stage overview +.LP +The standard execution pipeline contains the stages listed in the table +1, ordered by the execution time. Additional stages can be placed before +the argv stage, to modify the execution. Usually debugging programs and +other options can be included there. +.KF +.TS +center; +lB cB cB cB +l c c c. +_ +Stage Target Safe Copies +_ +trebuchet no no no +isolate yes no no +sbatch yes yes no +isolate yes no no +control yes yes no +srun yes yes no +isolate yes no yes +argv yes yes yes +program yes yes yes +_ +.TE +.QP +.B "Table 1" : +The stages of a standard execution pipeline. The +.B target +column determines whether the stage is running in the target cluster; +.B safe +states if the stage is running in the sandbox and +.B copies +if there are several instances of the stages running in parallel. +.QE +.KE