storage icon indicating copy to clipboard operation
storage copied to clipboard

Single host reportgen not working?

Open bernardhan33 opened this issue 1 year ago • 5 comments

Hi there,

This is similar to https://github.com/mlcommons/storage/issues/22.

I'm following all the instructions here but still received an error about the directory structure.

The full error message with the command is

bernardhan@bernardhan-high-perf:/mnt/disks/ssd-array/storage$ ./benchmark.sh reportgen --results-dir /mnt/disks/ssd-array/storage/results/unet3d/2023-07-11-04-19-09
2023-07-11 04:38:13 Error: Directory structure /mnt/disks/ssd-array/storage/results/unet3d/2023-07-11-04-19-09/summary.json is not correct. It has be in format result_dir/run(1..n)/host(1..n)/summary.json

And the file structure is

bernardhan@bernardhan-high-perf:/mnt/disks/ssd-array/storage$ ls /mnt/disks/ssd-array/storage/results/unet3d/2023-07-11-04-19-09
0_output.json  2_output.json  4_output.json  6_output.json  configs   per_epoch_stats.json
1_output.json  3_output.json  5_output.json  7_output.json  dlio.log  summary.json

I've also tried to run the same ./benchmark.sh run command multiple times but the results seem to just override rather than putting the previous ones in separate folder. Should we manually put the results in the desired format, or should the script do this for us? Thanks!

bernardhan33 avatar Jul 11 '23 04:07 bernardhan33

Additionally, for the multi-host setting, do we just run these benchmark scripts on the different hosts and aggregate the results before "reportgen"? Or does the script support this setting? I didn't see it from the source code here so wanted to confirm.

bernardhan33 avatar Jul 11 '23 05:07 bernardhan33

@Magichan33 Hi, I followed the latest instruction and met the same issue, have you found the way to reslove this issue?

YafeiWangAlice avatar Nov 29 '23 03:11 YafeiWangAlice

@YafeiWangAlice No I didn't resolve it since I just needed the summary and per-epoch stats so didn't need the aggregated "benchmark report" anyways.

Since this repo wraps around https://github.com/argonne-lcf/dlio_benchmark/tree/main, I found the report generated by dlio_postprocessor to be promising too. You can check it out there and hope it helps.

bernardhan33 avatar Nov 29 '23 05:11 bernardhan33

@YafeiWangAlice No I didn't resolve it since I just needed the summary and per-epoch stats so didn't need the aggregated "benchmark report" anyways.

Since this repo wraps around https://github.com/argonne-lcf/dlio_benchmark/tree/main, I found the report generated by dlio_postprocessor to be promising too. You can check it out there and hope it helps.

bernardhan33 avatar Nov 29 '23 05:11 bernardhan33

Are you looking for a single host multi run report? You can follow structure given in https://github.com/johnugeorge/mlperf-storage-sample-results

johnugeorge avatar Dec 03 '23 10:12 johnugeorge