storage icon indicating copy to clipboard operation
storage copied to clipboard

Workload resnet50: no IOs are generated at all with dali while benchmark reports it is running

Open alexander272272 opened this issue 10 months ago • 1 comments

Steps:

  • git clone -b v1.0-rc1 --recurse-submodules https://github.com/mlcommons/storage.git
  • pip3 install -r dlio_benchmark/requirements.txt
  • ./benchmark.sh datagen --workload resnet50 --accelerator-type h100 --num-parallel 8 --param dataset.num_files_train=1200 --param dataset.data_folder=/mnt/1/ifs/data/rosnet50_05_04_2024_x02
  • ./benchmark.sh run --hosts HOST --workload resnet50 --accelerator-type h100 --num-accelerators 2 --results-dir resultsdir-$(date +"%d-%m-%Y") --param dataset.num_files_train=1200 --param dataset.data_folder=/mnt/1/ifs/data/rosnet50_05_04_2024_x02

show progress but no IOs to the NAS

If config changed ./storage-conf/workload/resnet50_h100.yaml
< framework: pytorch
---
> framework: tensorflow


<  data_loader: dali
---
>  data_loader: tensorflow

Then IOs are generated and can be captured on the wire

# cat /etc/os-release
PRETTY_NAME="Ubuntu 22.04.1 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.1 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy

alexander272272 avatar Apr 08 '24 16:04 alexander272272

We are aware of that. We are in the process of addressing the issue in the DLIO code.

zhenghh04 avatar Apr 08 '24 20:04 zhenghh04