How to prevent caching?
I am using cm to download the MLPerf DLRM model (~100G) using cm. However, I want to specify the final location of this dataset. By default, it resides in a 'cache' directory with a pseudo-random key in the filepath, so I cannot predict the final location beforehand. Ideally, I want to simply specify the output directory or prevent caching so that it will land in the local dir.
However, despite searching for a way to do this with the documentation in this repo (and trying '--no-cache') the model continues to be cached. Any guidance here?
Hi @keithachorn-intel we'll add the --no-cache option soon. But you can use --to=<download path> option to change the location of the model download. Please let us know if this works for you.
https://github.com/GATEOverflow/cm4mlops/blob/mlperf-inference/script/get-ml-model-dlrm-terabyte/_cm.json#L21
@anandhu-eng we can follow up our discussion for --no-cache
Sure @arjunsuresh 🤝
I am returning to this thread for a separate download attempt
This is the package I'm trying download: https://github.com/mlcommons/cm4mlops/tree/mlperf-inference/script/get-ml-model-llama2
It appears to download fully to the cache, but I cannot get it to find the intended directory. I've tried:
- Setting the '--to' flag
- Setting the '--outdirname' flag
- Setting these environmental variables: LLAMA2_CHECKPOINT_PATH and CM_ML_MODEL_PATH
None appeared effective at setting the final model download location. Any suggestions?
@keithachorn-intel
Based on your previous request, now we have --outdirname which is uniform for all scripts. The previous to option was only applicable to scripts for which it is implemented.
Also, we are now supporting mlperf-automations via MLCFLow in the MLPerf Automations repository, so not sure if this option is working on the cm4mlops repository which we don't have access to now.
For llama2-70b checkpoint from MLCommons (for submission) you can do
pip install mlc-scripts
mlcr get,ml-model,llama2,_70b --outdirname=<myout_dir>
7b model:
mlcr get,ml-model,llama2,_7b --outdirname=<myout_dir>
For llama2 70b checkpoint from Huggingface you can do
mlcr get,ml-model,llama2,_hf,_70b --outdirname=<myout_dir>
Hi @arjunsuresh . Thank you for the quick reply. I did try adding 'outdirname' (mentioned above), which only worked for downloading the dataset script, but not the model script. However, your 'mlcr' script did work for my needs. Thank you.
You're welcome @keithachorn-intel Glad that it worked. Sorry that there was an issue with the model variants if you were downloading from MLCommons and not Huggingface. Just fixed it now. Please see the updated commands.
I’m glad the issue is resolved, @keithachorn-intel ! I will go ahead and close this ticket. Please don’t hesitate to reach out if you have any further questions!