storage icon indicating copy to clipboard operation
storage copied to clipboard

MLPerf™ Storage Benchmark Suite

Results 72 storage issues
Sort by recently updated
recently updated
newest added

Running only the read part `mlpstorage checkpointing run --hosts 10.57.205.101,10.57.205.102 --model llama3-70b --client-host-memory-in-gb 220 --num-processes 8 --checkpoint-folder /mnt/host_checkpointing --results-dir checkpoint_test_4_hosts_llama3-70b --num-checkpoints-read 1 --num-checkpoints-write 0 --allow-run-as-root` Test seems to be succeeding,...

Using 16 or 32 hosts, each with 256G memory, test starts and each host has dlio_benchmark processes running, but no progress on the test itself. 8b and 70b models runs...

Assigning different accelerator count per host, where the total number of accelerators is not divisible by host count (for eg, 9 accelerators, 2 hosts, 5 & 4 accelerators on each...

code

Since we are not merging the fix into v2.0 and agreed to allow using the fix if anyone needs it, we need to document it as an allowed change in...

rule

From the last meeting, we agreed upon allowing using fix for issue#157 (https://github.com/mlcommons/storage/issues/157) as part of closed category submission. We need to add that as part of submission rules as...

rule

Following up on the concern raised in issue #177 , I noticed that although the Submission Guidelines have been updated, the `mlperf_storage_report.json` file still cannot be generated as expected. When...

I'm getting these warning when running mlpstorage ```bash WARNING: All log messages before absl::InitializeLog() is called are written to STDERR E0000 00:00:1750688708.806475 179026 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting...

code

Seeing DLIO hang during the first epoch when running certain accelerators counts. Running with 6x a100 or 7x a100 will cause the test to hang after printing the summary of...

code

See the log below. It should be set as OPEN ``` 2025-06-20 09:55:15|STATUS: Benchmark results directory: ./results/eagle/n2x8/checkpointing/llama3-8b/20250620_095514 2025-06-20 09:55:15|INFO: Found benchmark run: checkpointing_run_llama3-8b_20250620_095514 2025-06-20 09:55:15|STATUS: Verifying benchmark run for checkpointing_run_llama3-8b_20250620_095514...

code

I want to request that for single-system power measurement, BMC/IPMI-reported power, or in-band reported via ACPI or other methods, can be used. This is much more detailed and useful than...

code