litgpt icon indicating copy to clipboard operation
litgpt copied to clipboard

Don't redownload files by default

Open rasbt opened this issue 1 year ago • 3 comments

Executing the download command twice will redownload a model:

litgpt download --repo_id microsoft/phi-2
litgpt download --repo_id microsoft/phi-2

We could check for existing model files and not download in this case. Then, we could add a --force option to redownload the model files in case where files were corrupted (basically as a shorthand for rm -rf checkpoints/... & litgpt download)

rasbt avatar Mar 19 '24 23:03 rasbt

This is feasible now that we get the list of bins or safetensors before running the download: https://github.com/Lightning-AI/litgpt/blob/main/litgpt/scripts/download.py#L54

What would you do if?:

  • The bins/safetensors exists but the lit_model.pth doesn't
  • The bins/safetensors do not exist but the lit_model.pth does

carmocca avatar Mar 20 '24 01:03 carmocca

The bins/safetensors exists but the lit_model.pth doesn't

I'd say

  1. Don't download but just print a message: "Skipping download. Files already exist. Override with --force_download true"
  2. Then run the conversion

The bins/safetensors do not exist but the lit_model.pth does

Here, I'd say

  1. Download the files
  2. Don't do the conversion perhaps. Maybe print a message like "Downloaded files but skipping conversion since lit_model.pth already exists. To force the conversion, run lit convert --checkpoint_dir <this dir>.

rasbt avatar Mar 21 '24 00:03 rasbt

Sounds good to me!

carmocca avatar Mar 21 '24 01:03 carmocca