Commit Graph

66 Commits

Author SHA1 Message Date
Rodrigo Arias Mallo
64f077c4f6 stages: prepend the stage name to messages 2021-04-16 09:29:33 +02:00
Rodrigo Arias Mallo
7c94997023 control: add trap for bad exit 2021-04-16 09:29:33 +02:00
Rodrigo Arias Mallo
bde54c69c5 sbatch: store queued status 2021-04-16 09:29:33 +02:00
Rodrigo Arias Mallo
422d359b48 script: stop on error by default 2021-04-16 09:29:33 +02:00
Rodrigo Arias Mallo
71c06d02da stages: add baywatch stage to check the exit code
This workaround stage prevents srun from returning 0 to the upper stages
when a signal happens after MPI_Finalize. It writes the return code to a
file named .srun.rc.$rank and later checks that exists and contains a 0.

When the program is killed, exits with non-zero and the error is
propagated to the baywatch stage, which aborts immediately without
creating the rc file.
2021-04-16 09:29:26 +02:00
Rodrigo Arias Mallo
b0af9b8608 srun: add postSrun hook 2021-04-12 17:41:59 +02:00
Rodrigo Arias Mallo
87fa3bb336 sbatch: assert types to avoid silent parse errors 2021-03-19 16:37:31 +01:00
Rodrigo Arias Mallo
051a74b85d srun: allow commands to run before srun 2021-02-26 17:00:09 +01:00
Rodrigo Arias Mallo
8a77900201 srun: don't expand variables on install 2021-02-26 16:59:29 +01:00
Rodrigo Arias Mallo
ebcbf91fbe exec: allow manual specification of program path 2021-02-23 15:22:18 +01:00
Rodrigo Arias Mallo
e5561b8735 control: save total execution time 2021-02-08 14:14:08 +01:00
Rodrigo Arias Mallo
2b9c3da911 Add script stage 2021-01-12 18:19:49 +01:00
Rodrigo Arias Mallo
aeac1a6068 exec: Force newlines
Allow single line commands like pre="true"
2021-01-11 19:15:37 +01:00
Rodrigo Arias Mallo
130fe39c8e exec: Abort on error
We need exit on the first error, as otherwise we cannot track a bad
execution when no exec is done (when post is not empty).
2021-01-11 18:29:30 +01:00
Rodrigo Arias Mallo
7d4db6b6de control: Exit on error
This prevents srun from silently returning with an error, without
actually queueing the job of a run.
2020-12-07 16:33:40 +01:00
Rodrigo Arias Mallo
1bdeca9e7d unit: Remove dangerous slash from index names 2020-12-03 16:33:48 +01:00
Rodrigo Arias Mallo
c858f521bf isolate: add $TMPDIR in the namespace 2020-12-03 13:22:10 +01:00
Rodrigo Arias Mallo
da4bbf8533 isolate: only load some files from /etc 2020-12-03 12:04:51 +01:00
Rodrigo Arias Mallo
f87d830218 isolate: preserve TERM 2020-12-02 13:06:55 +01:00
Rodrigo Arias Mallo
3d352fee19 isolate: allow argument passing 2020-12-02 13:06:35 +01:00
Rodrigo Arias Mallo
1f841649f8 exec: add support for nixPrefix 2020-12-02 11:57:40 +01:00
Rodrigo Arias Mallo
a147a396d9 trebuchet: add the experiment as attribute 2020-11-20 15:35:36 +01:00
Rodrigo Arias Mallo
8bc5656461 tools: recursive getExperiment
It allows getExperimentStage to be called from any stage above the
experiment.
2020-11-20 15:34:14 +01:00
Rodrigo Arias Mallo
d192a59fdc control: Export the run iteration 2020-11-20 15:32:41 +01:00
Rodrigo Arias Mallo
734d494d96 stdexp: Allow extra mounts 2020-11-20 15:30:47 +01:00
David Alvarez
0c438d4dac Setup for test experiment 2020-11-20 13:57:12 +01:00
Rodrigo Arias Mallo
e8f649327a exec: Avoid variable expansion at build
All bash variables passed in env, pre or post are now expanded at
execution time..
2020-11-20 13:54:45 +01:00
Rodrigo Arias Mallo
e1e34ddf75 exec: add pre and post code to allow cleanup tasks 2020-11-17 16:09:38 +01:00
Rodrigo Arias Mallo
641e752bd5 Add a trace message at unit evaluation 2020-11-17 11:12:12 +01:00
Rodrigo Arias Mallo
317409f6ac Move index and out inside the user directory 2020-11-03 19:10:00 +01:00
Rodrigo Arias Mallo
5e2797bcde Create index files for the experiments 2020-11-03 19:10:00 +01:00
Rodrigo Arias Mallo
efd7df068e Print full experiment path 2020-11-03 19:10:00 +01:00
Rodrigo Arias Mallo
3bd4e61f3f WIP: Testing with automatic fetching 2020-11-03 19:09:59 +01:00
Rodrigo Arias Mallo
59346fa97e control: Add status file 2020-11-03 19:09:59 +01:00
Rodrigo Arias Mallo
4beb069627 WIP: postprocessing pipeline
Now each run is executed in a independent folder
2020-11-03 19:09:59 +01:00
Rodrigo Arias Mallo
2680dcb66f Don't nest the unit results
The experiment directory now contains symlinks to the units, keeping the
old structure. The unit results are directly placed in the garlic out
directory.
2020-11-03 19:09:58 +01:00
Rodrigo Arias Mallo
c3659d316d Add perf stage 2020-11-03 19:09:58 +01:00
Rodrigo Arias Mallo
80ccd1240a Less verbose execution 2020-10-14 16:29:22 +02:00
Rodrigo Arias Mallo
9d8f7d9074 Print the experiment being run 2020-10-14 16:28:27 +02:00
Rodrigo Arias Mallo
c7d2e2d866 Write the unit config in a file 2020-10-14 16:27:47 +02:00
Rodrigo Arias Mallo
7a37913b4e Set the ssh host from the machine config 2020-10-13 14:30:03 +02:00
Rodrigo Arias Mallo
a38ff31cca Introduce the runexp stage 2020-10-13 13:00:59 +02:00
Rodrigo Arias Mallo
6ab448b10a Fix trebuchet description 2020-10-09 20:28:00 +02:00
Rodrigo Arias Mallo
4de20d3aa5 Remove old stages and update some 2020-10-09 20:12:52 +02:00
Rodrigo Arias Mallo
27bc977590 Remove strace from isolate stage 2020-10-09 19:50:28 +02:00
Rodrigo Arias Mallo
332b738889 Move apps into garlic/apps 2020-10-09 16:42:06 +02:00
Rodrigo Arias Mallo
a576be8031 WIP stage redesign 2020-10-09 16:42:06 +02:00
Rodrigo Arias Mallo
654e243735 Include an index in the trebuchet 2020-10-09 16:42:06 +02:00
Rodrigo Arias Mallo
45afe7d391 Simplify experiment stage 2020-10-09 16:42:06 +02:00
Rodrigo Arias Mallo
d599b8c52f New naming convention 2020-10-09 16:42:06 +02:00