storage
storage copied to clipboard
Single host reportgen not working?
Hi there,
This is similar to https://github.com/mlcommons/storage/issues/22.
I'm following all the instructions here but still received an error about the directory structure.
The full error message with the command is
bernardhan@bernardhan-high-perf:/mnt/disks/ssd-array/storage$ ./benchmark.sh reportgen --results-dir /mnt/disks/ssd-array/storage/results/unet3d/2023-07-11-04-19-09
2023-07-11 04:38:13 Error: Directory structure /mnt/disks/ssd-array/storage/results/unet3d/2023-07-11-04-19-09/summary.json is not correct. It has be in format result_dir/run(1..n)/host(1..n)/summary.json
And the file structure is
bernardhan@bernardhan-high-perf:/mnt/disks/ssd-array/storage$ ls /mnt/disks/ssd-array/storage/results/unet3d/2023-07-11-04-19-09
0_output.json 2_output.json 4_output.json 6_output.json configs per_epoch_stats.json
1_output.json 3_output.json 5_output.json 7_output.json dlio.log summary.json
I've also tried to run the same ./benchmark.sh run
command multiple times but the results seem to just override rather than putting the previous ones in separate folder. Should we manually put the results in the desired format, or should the script do this for us? Thanks!
Additionally, for the multi-host setting, do we just run these benchmark scripts on the different hosts and aggregate the results before "reportgen"? Or does the script support this setting? I didn't see it from the source code here so wanted to confirm.
@Magichan33 Hi, I followed the latest instruction and met the same issue, have you found the way to reslove this issue?
@YafeiWangAlice No I didn't resolve it since I just needed the summary and per-epoch stats so didn't need the aggregated "benchmark report" anyways.
Since this repo wraps around https://github.com/argonne-lcf/dlio_benchmark/tree/main, I found the report generated by dlio_postprocessor
to be promising too. You can check it out there and hope it helps.
@YafeiWangAlice No I didn't resolve it since I just needed the summary and per-epoch stats so didn't need the aggregated "benchmark report" anyways.
Since this repo wraps around https://github.com/argonne-lcf/dlio_benchmark/tree/main, I found the report generated by dlio_postprocessor
to be promising too. You can check it out there and hope it helps.
Are you looking for a single host multi run report? You can follow structure given in https://github.com/johnugeorge/mlperf-storage-sample-results