torchtune
torchtune copied to clipboard
File issues with 70B Lora setup
The 70B Lora setup is specifying checkpoints in the default config. But when I download the model it only has the original/consolidated.0*.pth
files. When I try to run the model with the default config I get file not found
for the checkpoints.
https://github.com/pytorch/torchtune/blob/8f59c2fecd722691271eecca630a526719a32f76/recipes/configs/llama3/70B_lora.yaml#L28-L59
Am I missing something?
Hey @BedirT, are you installing torchtune via git clone and have recently pulled from main? I was able to run the 70B_lora config with the following commands:
tune download meta-llama/Meta-Llama-3-70B-Instruct \
--hf-token <TOKEN> \
--output-dir /tmp/Meta-Llama-3-70B-Instruct \
--ignore-patterns "original/consolidated*"
tune run --nproc_per_node 8 lora_finetune_distributed --config recipes/configs/llama3/70B_lora.yaml
Let me know if a fresh pull and running these commands still don't work for you.
I was actually doing a pip install, ill try to build from the source. Why do we need the --ignore-patterns
here?
@BedirT Thanks for filing this issue! So yeah as @RdoubleA mentioned, please run the tune download
command with the --ignore-patterns
flag added (this is mentioned in the config as well as the README, but can probably be documented better.
The reason we need to specify --ignore-patterns "original/consolidated*"
is because currently tune download
ignores safetensor format files by default. Though in the case of the 70B model, we want the safetensor format (as we only support this HF format for checkpoint loading) and don't need the meta checkpoint, which are stored in the original/
directory with file prefixes beginning with consolidated
.
Let me know if this makes sense or if you have any more questions!
Okay yeah I don't know how I missed that line 😄 I was looking at the 8B download commands only. Thanks for the explanation and responses!