torchtune icon indicating copy to clipboard operation
torchtune copied to clipboard

File issues with 70B Lora setup

Open BedirT opened this issue 9 months ago • 3 comments

The 70B Lora setup is specifying checkpoints in the default config. But when I download the model it only has the original/consolidated.0*.pth files. When I try to run the model with the default config I get file not found for the checkpoints.

https://github.com/pytorch/torchtune/blob/8f59c2fecd722691271eecca630a526719a32f76/recipes/configs/llama3/70B_lora.yaml#L28-L59

Am I missing something?

BedirT avatar Apr 25 '24 00:04 BedirT

Hey @BedirT, are you installing torchtune via git clone and have recently pulled from main? I was able to run the 70B_lora config with the following commands:

tune download meta-llama/Meta-Llama-3-70B-Instruct \
--hf-token <TOKEN> \
--output-dir /tmp/Meta-Llama-3-70B-Instruct \
--ignore-patterns "original/consolidated*"

tune run --nproc_per_node 8 lora_finetune_distributed --config recipes/configs/llama3/70B_lora.yaml

Let me know if a fresh pull and running these commands still don't work for you.

RdoubleA avatar Apr 25 '24 01:04 RdoubleA

I was actually doing a pip install, ill try to build from the source. Why do we need the --ignore-patterns here?

BedirT avatar Apr 25 '24 07:04 BedirT

@BedirT Thanks for filing this issue! So yeah as @RdoubleA mentioned, please run the tune download command with the --ignore-patterns flag added (this is mentioned in the config as well as the README, but can probably be documented better.

The reason we need to specify --ignore-patterns "original/consolidated*" is because currently tune download ignores safetensor format files by default. Though in the case of the 70B model, we want the safetensor format (as we only support this HF format for checkpoint loading) and don't need the meta checkpoint, which are stored in the original/ directory with file prefixes beginning with consolidated.

Let me know if this makes sense or if you have any more questions!

rohan-varma avatar Apr 25 '24 08:04 rohan-varma

Okay yeah I don't know how I missed that line 😄 I was looking at the 8B download commands only. Thanks for the explanation and responses!

BedirT avatar Apr 26 '24 01:04 BedirT