storage
storage copied to clipboard
Workload resnet50: no IOs are generated at all with dali while benchmark reports it is running
Steps:
- git clone -b v1.0-rc1 --recurse-submodules https://github.com/mlcommons/storage.git
- pip3 install -r dlio_benchmark/requirements.txt
- ./benchmark.sh datagen --workload resnet50 --accelerator-type h100 --num-parallel 8 --param dataset.num_files_train=1200 --param dataset.data_folder=/mnt/1/ifs/data/rosnet50_05_04_2024_x02
- ./benchmark.sh run --hosts HOST --workload resnet50 --accelerator-type h100 --num-accelerators 2 --results-dir resultsdir-$(date +"%d-%m-%Y") --param dataset.num_files_train=1200 --param dataset.data_folder=/mnt/1/ifs/data/rosnet50_05_04_2024_x02
show progress but no IOs to the NAS
If config changed ./storage-conf/workload/resnet50_h100.yaml
< framework: pytorch
---
> framework: tensorflow
< data_loader: dali
---
> data_loader: tensorflow
Then IOs are generated and can be captured on the wire
# cat /etc/os-release
PRETTY_NAME="Ubuntu 22.04.1 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.1 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy
We are aware of that. We are in the process of addressing the issue in the DLIO code.