reframe icon indicating copy to clipboard operation
reframe copied to clipboard

Parameters values for filelog perflogs

Open flx42 opened this issue 3 years ago • 2 comments

Feel free to tell me that you don't want to support this use case if it's not useful for your sites.

We have a unit test that executes on each IB NIC/HCA on a node, this is done through a parameterized test:

hca = parameter(["mlx5_0", "mlx5_1", "mlx5_2", "mlx5_3", "mlx5_6", "mlx5_7", "mlx5_8", "mlx5_9"])

With the naming scheme in 3.12.0, we could simply access the historical performance for one HCA given that the path was predictable:

$ cat ./logs/node0042/perflogs/ib_write_bw_loopback_mlx5_6.log
2022-08-07T09:54:13|reframe 3.12.0|ib_write_bw_loopback %hca=mlx5_6 ....
2022-08-08T10:28:11|reframe 3.12.0|ib_write_bw_loopback %hca=mlx5_6 ....

With 833c6ea3582bdcf4cceb6dc7a1aa667d0ca029e2, it's now a little more complex as the output file name is not predictable anymore, we would have to grep across all logs for this test to find the right HCA.

I understand the limitations of the old naming scheme mentioned in https://github.com/reframe-hpc/reframe/blob/833c6ea3582bdcf4cceb6dc7a1aa667d0ca029e2/docs/manpage.rst#test-naming-scheme, but perhaps we could have a way to modify the name of the filelog output file (doesn't seem possible today: https://reframe-hpc.readthedocs.io/en/stable/config_reference.html#the-filelog-log-handler)? And then a new log record attribute that would allow to get parameters names in the file name.

flx42 avatar Aug 08 '22 17:08 flx42

Another potential issue with the current naming is that individual tests might get mixed up if the test changes. For instance, with my parameters list above, let's say we have the following file:

$ cat perflogs/ib_write_bw_loopback_7.log 
2022-08-08T11:11:32|reframe 4.0.0-dev.0|ib_write_bw_loopback %hca=mlx5_9 ...
2022-08-08T11:16:40|reframe 4.0.0-dev.0|ib_write_bw_loopback %hca=mlx5_9 ...

If I now add "mlx5_4" to the list of parameters, this file will store the results of 2 HCAs (mlx5_9 and mlx5_8) as the tests got renumbered:

$ cat perflogs/ib_write_bw_loopback_7.log 
2022-08-08T11:11:32|reframe 4.0.0-dev.0|ib_write_bw_loopback %hca=mlx5_9 ...
2022-08-08T11:16:40|reframe 4.0.0-dev.0|ib_write_bw_loopback %hca=mlx5_9 ...
2022-08-08T11:20:03|reframe 4.0.0-dev.0|ib_write_bw_loopback %hca=mlx5_8 ...

With the previous naming, it would have simply created a new file ib_write_bw_loopback_mlx5_4.log instead.

flx42 avatar Aug 08 '22 18:08 flx42

Yes, indeed, that's the limitation of the use of the unique_name in file name components and we need to find a way around it. One obvious solution is to use a short hash out of the display name to name the various directories. Again this is not human readable, but at least is predictable to the extent that the parameter name does not change.

vkarak avatar Aug 09 '22 07:08 vkarak