Don't redownload files by default
Executing the download command twice will redownload a model:
litgpt download --repo_id microsoft/phi-2
litgpt download --repo_id microsoft/phi-2
We could check for existing model files and not download in this case. Then, we could add a --force option to redownload the model files in case where files were corrupted (basically as a shorthand for rm -rf checkpoints/... & litgpt download)
This is feasible now that we get the list of bins or safetensors before running the download: https://github.com/Lightning-AI/litgpt/blob/main/litgpt/scripts/download.py#L54
What would you do if?:
- The bins/safetensors exists but the lit_model.pth doesn't
- The bins/safetensors do not exist but the lit_model.pth does
The bins/safetensors exists but the lit_model.pth doesn't
I'd say
- Don't download but just print a message: "Skipping download. Files already exist. Override with --force_download true"
- Then run the conversion
The bins/safetensors do not exist but the lit_model.pth does
Here, I'd say
- Download the files
- Don't do the conversion perhaps. Maybe print a message like "Downloaded files but skipping conversion since lit_model.pth already exists. To force the conversion, run
lit convert --checkpoint_dir <this dir>.
Sounds good to me!