ck icon indicating copy to clipboard operation
ck copied to clipboard

How to prevent caching?

Open keithachorn-intel opened this issue 1 year ago • 2 comments

I am using cm to download the MLPerf DLRM model (~100G) using cm. However, I want to specify the final location of this dataset. By default, it resides in a 'cache' directory with a pseudo-random key in the filepath, so I cannot predict the final location beforehand. Ideally, I want to simply specify the output directory or prevent caching so that it will land in the local dir.

However, despite searching for a way to do this with the documentation in this repo (and trying '--no-cache') the model continues to be cached. Any guidance here?

keithachorn-intel avatar Jun 24 '24 20:06 keithachorn-intel

Hi @keithachorn-intel we'll add the --no-cache option soon. But you can use --to=<download path> option to change the location of the model download. Please let us know if this works for you.

https://github.com/GATEOverflow/cm4mlops/blob/mlperf-inference/script/get-ml-model-dlrm-terabyte/_cm.json#L21

@anandhu-eng we can follow up our discussion for --no-cache

arjunsuresh avatar Jun 26 '24 00:06 arjunsuresh

Sure @arjunsuresh 🤝

anandhu-eng avatar Jun 26 '24 05:06 anandhu-eng

I am returning to this thread for a separate download attempt

This is the package I'm trying download: https://github.com/mlcommons/cm4mlops/tree/mlperf-inference/script/get-ml-model-llama2

It appears to download fully to the cache, but I cannot get it to find the intended directory. I've tried:

  • Setting the '--to' flag
  • Setting the '--outdirname' flag
  • Setting these environmental variables: LLAMA2_CHECKPOINT_PATH and CM_ML_MODEL_PATH

None appeared effective at setting the final model download location. Any suggestions?

keithachorn-intel avatar Feb 13 '25 21:02 keithachorn-intel

@keithachorn-intel

Based on your previous request, now we have --outdirname which is uniform for all scripts. The previous to option was only applicable to scripts for which it is implemented.

Also, we are now supporting mlperf-automations via MLCFLow in the MLPerf Automations repository, so not sure if this option is working on the cm4mlops repository which we don't have access to now.

For llama2-70b checkpoint from MLCommons (for submission) you can do

pip install mlc-scripts
mlcr get,ml-model,llama2,_70b --outdirname=<myout_dir>

7b model:

mlcr get,ml-model,llama2,_7b --outdirname=<myout_dir>

For llama2 70b checkpoint from Huggingface you can do

mlcr get,ml-model,llama2,_hf,_70b --outdirname=<myout_dir>

arjunsuresh avatar Feb 13 '25 23:02 arjunsuresh

Hi @arjunsuresh . Thank you for the quick reply. I did try adding 'outdirname' (mentioned above), which only worked for downloading the dataset script, but not the model script. However, your 'mlcr' script did work for my needs. Thank you.

keithachorn-intel avatar Feb 14 '25 08:02 keithachorn-intel

You're welcome @keithachorn-intel Glad that it worked. Sorry that there was an issue with the model variants if you were downloading from MLCommons and not Huggingface. Just fixed it now. Please see the updated commands.

arjunsuresh avatar Feb 14 '25 13:02 arjunsuresh

I’m glad the issue is resolved, @keithachorn-intel ! I will go ahead and close this ticket. Please don’t hesitate to reach out if you have any further questions!

gfursin avatar Feb 14 '25 15:02 gfursin