Rodrigo Arias Mallo
3e197da8a3
hpcg: update figures and remove old ones
2021-04-19 16:05:10 +02:00
Rodrigo Arias Mallo
866d4561d3
hpcg: remove old experiments
2021-04-19 16:01:11 +02:00
Rodrigo Arias Mallo
9a88319153
hpcg: add granularity experiment
2021-04-19 16:00:55 +02:00
Rodrigo Arias Mallo
a96839d11a
hpcg: merge weak scaling and add size experiment
...
The scaling.nix file defines both the strong and weak experiments by
using the parameter "enableStrong".
2021-04-19 15:57:31 +02:00
Rodrigo Arias Mallo
a71ae9c2c6
hpcg: avoid mismatching names for gen units
2021-04-16 16:15:16 +02:00
Rodrigo Arias Mallo
d490ef2694
hpcg: remove unused extrae.xml file
2021-04-16 16:14:48 +02:00
Rodrigo Arias Mallo
b4e37a15a9
hpcg: refactor ss and gen using a common file
...
- The file gen.nix now provides an experiment for each unit, to reduce
the evaluation time.
- The pipeline is specified in the common.nix file only.
- The input dataset path is no longer symlinked, but is specified in the
"--load" argument.
- The size is renamed to "sizePerTask" instead of "n".
2021-04-16 11:51:34 +02:00
Rodrigo Arias Mallo
9bb570af7f
tools: add floatTruncate function
2021-04-16 11:49:37 +02:00
Raúl Peñacoba
4d629fe8f7
hpcg: remove old comments
2021-04-16 09:32:28 +02:00
Raúl Peñacoba
f5c8d0cb88
hpcg: choose a smaller strong scaling problem size
2021-04-16 09:32:28 +02:00
Raúl Peñacoba
cb6577b439
hpcg: add strongscaling
...
HPCG rounds problem size axis when its value is < 16
2021-04-16 09:32:28 +02:00
Raúl Peñacoba
b60a46b683
hpcg: add weakscaling over some nblocks to check which axis is better
2021-04-16 09:32:28 +02:00
Raúl Peñacoba
1a6075a2b1
hpcg: add first granularity/scalability exps for tampi+isend+oss+task
...
- oss.nix runs valid hpcg layouts whereas slices.nix does not
2021-04-16 09:32:28 +02:00
Rodrigo Arias Mallo
12ff1fd506
garlicd: send logs to the builder
2021-04-16 09:29:33 +02:00
Rodrigo Arias Mallo
732b0c0e9c
garlic tool: improve unit status information
2021-04-16 09:29:33 +02:00
Rodrigo Arias Mallo
64f077c4f6
stages: prepend the stage name to messages
2021-04-16 09:29:33 +02:00
Rodrigo Arias Mallo
7c94997023
control: add trap for bad exit
2021-04-16 09:29:33 +02:00
Rodrigo Arias Mallo
fb0dee4b61
exp: move exit1 experiment to slurm
2021-04-16 09:29:33 +02:00
Rodrigo Arias Mallo
bde54c69c5
sbatch: store queued status
2021-04-16 09:29:33 +02:00
Rodrigo Arias Mallo
2151e20bd6
exp: add exit1 experiment
...
Tests unit bad exits
2021-04-16 09:29:33 +02:00
Rodrigo Arias Mallo
886d16bcc6
garlic tool: add jq as dependency
...
So we can parse the experiment configuration in JSON
2021-04-16 09:29:33 +02:00
Rodrigo Arias Mallo
5c0f179830
stdexp: rename "name" to "clusterName"
2021-04-16 09:29:33 +02:00
Rodrigo Arias Mallo
422d359b48
script: stop on error by default
2021-04-16 09:29:33 +02:00
Rodrigo Arias Mallo
60248ab06b
article: remove not used figures
2021-04-16 09:29:33 +02:00
Rodrigo Arias Mallo
1cb63b464d
osu: adjust figures for publication
2021-04-16 09:29:33 +02:00
Rodrigo Arias Mallo
821b4f0d15
rplot: patch scales and fontconfig
2021-04-16 09:29:33 +02:00
Rodrigo Arias Mallo
0cf35decc5
osu: add mtu and eager experiments
2021-04-16 09:29:33 +02:00
Rodrigo Arias Mallo
26e3a86c78
garlic tool: check the presence of all the units
...
This check prevents a user from removing units between the
execution of the experiment and the fetch.
2021-04-16 09:29:33 +02:00
Rodrigo Arias Mallo
b96c39e0ba
noise: add srun signal bug to the list
2021-04-16 09:29:33 +02:00
Rodrigo Arias Mallo
f842f1e01d
slurm: add sigsegv experiment
...
Ensure that we can catch a sigsegv signal before and after the
MPI_Finalize call.
2021-04-16 09:29:33 +02:00
Rodrigo Arias Mallo
71c06d02da
stages: add baywatch stage to check the exit code
...
This workaround stage prevents srun from returning 0 to the upper stages
when a signal happens after MPI_Finalize. It writes the return code to a
file named .srun.rc.$rank and later checks that exists and contains a 0.
When the program is killed, exits with non-zero and the error is
propagated to the baywatch stage, which aborts immediately without
creating the rc file.
2021-04-16 09:29:26 +02:00
Rodrigo Arias Mallo
604cfd90a3
test: add sigsegv after MPI_Finalize test
...
The current srun version used in MN4 returns 0 if the program crashes
after MPI_Finalize, as shown by this test.
2021-04-16 09:28:02 +02:00
Rodrigo Arias Mallo
07253c3fa0
fwi: update figure index
2021-04-14 17:18:46 +02:00
Rodrigo Arias Mallo
eab323a13a
fwi: update io figure
2021-04-14 17:18:24 +02:00
Rodrigo Arias Mallo
8ce2a68cd7
fwi: update strong scaling figure script
2021-04-14 17:16:12 +02:00
Rodrigo Arias Mallo
99c6196734
fwi: update granularity figure
2021-04-14 17:05:09 +02:00
Rodrigo Arias Mallo
dd75a840ce
fwi: use enableIO instead of ioFreq
2021-04-12 20:09:17 +02:00
Rodrigo Arias Mallo
e49e3b087f
fwi: rename big io experiment
2021-04-12 19:49:31 +02:00
Rodrigo Arias Mallo
59040d9355
fwi: fix inverted resources
2021-04-12 19:31:35 +02:00
Rodrigo Arias Mallo
6422741cb7
fwi: merge io experiments into one file
...
The enableExtended parameter control if the experiment runs with
multiple nodes or only one.
2021-04-12 19:27:45 +02:00
Rodrigo Arias Mallo
99beac9b23
fwi: generate the model in every node
...
As we are using local storage, we need a copy of the input in every
node. The current method is to run the generator only in the rank which
has assigned the cpu 0 in the mask.
2021-04-12 19:01:10 +02:00
Rodrigo Arias Mallo
58dc277d3d
fwi: refactor ss-io with common.nix
...
Also, keep the names short and consistent.
2021-04-12 17:57:46 +02:00
Rodrigo Arias Mallo
47b326c646
fwi: generate the input at runtime
2021-04-12 17:46:07 +02:00
Rodrigo Arias Mallo
419e7f95cc
fwi: avoid input generation
...
The ModelGenerator is now included in the fwi-params, so that the input
can be generated at runtime.
2021-04-12 17:43:30 +02:00
Rodrigo Arias Mallo
b0af9b8608
srun: add postSrun hook
2021-04-12 17:41:59 +02:00
Rodrigo Arias Mallo
4afda7dbfb
fwi: use common.nix in sync_io experiment
2021-04-12 16:27:18 +02:00
Rodrigo Arias Mallo
02a103565c
fwi: use common.nix in reuse experiment
2021-04-12 15:48:59 +02:00
Rodrigo Arias Mallo
788dd13ebd
fwi: merge mpi pure experiment
...
The getResources function is used to assign the proper cpu binding
depending on the version. However, additional contraints are required to
ensure that we have enough points in Y.
By default the mpi+send+seq branch is disabled.
2021-04-12 15:37:39 +02:00
Rodrigo Arias Mallo
41665bc6fc
fwi: refactor config generation into common.nix
2021-04-12 15:01:25 +02:00
Rodrigo Arias Mallo
9aa07993b2
fwi: refactor ss and granularity experiments
...
A common.nix file contains the shared stages
2021-04-12 14:41:26 +02:00