reframe
reframe copied to clipboard
Parameters values for filelog perflogs
Feel free to tell me that you don't want to support this use case if it's not useful for your sites.
We have a unit test that executes on each IB NIC/HCA on a node, this is done through a parameterized test:
hca = parameter(["mlx5_0", "mlx5_1", "mlx5_2", "mlx5_3", "mlx5_6", "mlx5_7", "mlx5_8", "mlx5_9"])
With the naming scheme in 3.12.0, we could simply access the historical performance for one HCA given that the path was predictable:
$ cat ./logs/node0042/perflogs/ib_write_bw_loopback_mlx5_6.log
2022-08-07T09:54:13|reframe 3.12.0|ib_write_bw_loopback %hca=mlx5_6 ....
2022-08-08T10:28:11|reframe 3.12.0|ib_write_bw_loopback %hca=mlx5_6 ....
With 833c6ea3582bdcf4cceb6dc7a1aa667d0ca029e2, it's now a little more complex as the output file name is not predictable anymore, we would have to grep across all logs for this test to find the right HCA.
I understand the limitations of the old naming scheme mentioned in https://github.com/reframe-hpc/reframe/blob/833c6ea3582bdcf4cceb6dc7a1aa667d0ca029e2/docs/manpage.rst#test-naming-scheme, but perhaps we could have a way to modify the name of the filelog output file (doesn't seem possible today: https://reframe-hpc.readthedocs.io/en/stable/config_reference.html#the-filelog-log-handler)? And then a new log record attribute that would allow to get parameters names in the file name.
Another potential issue with the current naming is that individual tests might get mixed up if the test changes. For instance, with my parameters list above, let's say we have the following file:
$ cat perflogs/ib_write_bw_loopback_7.log
2022-08-08T11:11:32|reframe 4.0.0-dev.0|ib_write_bw_loopback %hca=mlx5_9 ...
2022-08-08T11:16:40|reframe 4.0.0-dev.0|ib_write_bw_loopback %hca=mlx5_9 ...
If I now add "mlx5_4" to the list of parameters, this file will store the results of 2 HCAs (mlx5_9 and mlx5_8) as the tests got renumbered:
$ cat perflogs/ib_write_bw_loopback_7.log
2022-08-08T11:11:32|reframe 4.0.0-dev.0|ib_write_bw_loopback %hca=mlx5_9 ...
2022-08-08T11:16:40|reframe 4.0.0-dev.0|ib_write_bw_loopback %hca=mlx5_9 ...
2022-08-08T11:20:03|reframe 4.0.0-dev.0|ib_write_bw_loopback %hca=mlx5_8 ...
With the previous naming, it would have simply created a new file ib_write_bw_loopback_mlx5_4.log instead.
Yes, indeed, that's the limitation of the use of the unique_name in file name components and we need to find a way around it. One obvious solution is to use a short hash out of the display name to name the various directories. Again this is not human readable, but at least is predictable to the extent that the parameter name does not change.