torchtune Cant load llama3 8B into memory

ValueError: Unable to load checkpoint from Meta-Llama-3-8B/original/consolidated.00.pth. while running --config llama3/8B_qlora_single_device

Apr 24 '24 00:04 abpani

Mind sharing more information on the steps you took to hit this error? For reference, I tried a fresh install and fresh download and unfortunately am unable to repro this.

Apr 24 '24 03:04 kartikayk

@kartikayk Are you not able to clone the repo or download the model weight. for downloading the model weight use below command wget --header="Authorization: Bearer HF_TOKEN" https://huggingface.co/datasets/GeneralAwareness/Various/resolve/main/file.zip

I am able to clone the repo without the model weights. for cloning use this please tune download meta-llama/Meta-Llama-3-8B --output-dir Meta-Llama-3-8B --hf-token <HF_TOKEN>

THen I just ran the qlora single device yaml file it is giving me this error

Apr 24 '24 15:04 abpani

Okay nevermind. Made it work after checking the source code . it did not work for the consolidated.pth but working for the HF checkpiint files . safetensors.

Apr 24 '24 15:04 abpani

Finetuning is working now but only using safetensors. 'peak_memory_reserved': 12.637437952

Apr 24 '24 15:04 abpani