inference icon indicating copy to clipboard operation
inference copied to clipboard

script aborts with 521 Killed

Open howudodat opened this issue 1 year ago • 3 comments

running:

cm run script --tags=run-mlperf,inference,_find-performance,_full,_r4.1 --model=llama2-70b-99 --implementation=reference --framework=pytorch --category=datacenter --scenario=Offline --execution_mode=test --device=cpu --docker --quiet --test_query_count=50

results in several hours of silence after which this error is produced

git clone  --recurse-submodules https://huggingface.co/meta-llama/Llama-2-70b-chat-hf --depth 5 repo

Cloning into 'repo'...
Username for 'https://huggingface.co': howudodat
Password for 'https://[email protected]': 
remote: Enumerating objects: 58, done.
remote: Counting objects: 100% (58/58), done.
remote: Compressing objects: 100% (56/56), done.
remote: Total 58 (delta 9), reused 42 (delta 2), pack-reused 0 (from 0)
Unpacking objects: 100% (58/58), 511.53 KiB | 5.12 MiB/s, done.
Username for 'https://huggingface.co': howudodat
Password for 'https://[email protected]': 
/home/cmuser/CM/repos/mlcommons@cm4mlops/script/get-git-repo/run.sh: line 51:   521 Killed                  ${CM_GIT_CLONE_CMD}

CM error: Portable CM script failed (name = get-git-repo, return code = 256)

Any ideas?

howudodat avatar Aug 14 '24 21:08 howudodat