storage
storage copied to clipboard
MLPerf™ Storage Benchmark Suite
In the Storage WG meeting I made a suggestion for having the benchmark check the code. This is an example of how that could work. It will add a 'git'...
The datagen example: ./benchmark.sh datagen --hosts 10.117.61.121,10.117.61.165 --workload unet3d --accelerator-type h100 --num-parallel 8 --param dataset.num_files_train=1200 --param dataset.data_folder=unet3d_data Returns an invalid --hosts error. Without the --hosts option datagen works correctly (looks...
Checkpointing is a critical piece of AI/ML and there are frequently systems deployed solely to hold checkpoints. I have a basic PoC for running a checkpoint here: https://github.com/wvaske/checkpoint_bench/blob/main/do_checkpoints.py What features...
I get the following stack trace while executing datagen, but after that the datagen continues normally. It does not finish though. I left it running overnight by morning it has...
I'm trying to run a single node benchmark with resnet-50 and 32 accelerators on v1.0 tag. ``` ubuntu@ip-xxx-xxx-xxx-xxx:/mnt/training_volume/benchmark/storage$ ./benchmark.sh run --hosts xxx.xxx.xxx.xxx --workload resnet50 --accelerator-type h100 --num-accelerators 32 --results-dir run2...
Hey folks! I saw that someone asked the same question yesterday on the mailinglist, but nobody has answered so I thought I bring it here since I'm running into the...
Benchmark script is getting migrated from bash to python for better integration with results checking scripts. - Update to latest version of DLIO - Started updating rules document - Separate...
We should consider sorting the definitions in section 3 to be ordered alphabetically.
Phase 1 of the power benchmarking additions involves requiring submitters to include additional documentation about their system related to power supply unit ratings and power supply unit topology. Definitions and...
The current section "1.1 Timeline" in the submission guidelines document has content from last year's benchmark. This needs to be updated to include the dates for the upcoming 2.0 benchmark.