Cal Mitchell
Cal Mitchell
Great! I will play around with the functions you mentioned to see if it is possible. If not, I will keep an eye on Huggingface's PR to see if that...
I do think an extra processing step is required. Here are exactly the steps I took to download and fine-tune Llama 2 with torchtune, then process it to load successfully...
So, the problem is already solved for Llama 2, but of course, everyone wants Llama 3 :-) There may be some pre-release code in the Huggingface repo that adds Llama...
Great. This issue is important to me so I will work on it tomorrow while keeping the goal of having it work OOTB with the checkpointer in mind. Happy to...
> The model checkpoint itself doesn't need any changes. Let me know if you disagree? In addition to JSON wrangling, I believe HF's model conversion code is: - [Permuting `q_proj`...
@apthagowda97, you will most likely need to convert to HF first, as that is probably the format whatever tool you're using to convert is expecting.
@kartikayk, following our discussion, here is a full reproducible series of steps that I took to download and convert Llama 3 to a HF format that can be read by...
@hugebeanie, running `convert_model.py` with a 70B param model (saved in bfloat16) will take ~140GB of CPU RAM (not GPU RAM). I observed the process using a peak of 143.2 GB...
@monk1337, what part are you have an issue with? I just ran the steps above without any issue for the 70B model. If you're having trouble running torchtune on the...
Hello, like pratyushpal, I also found the `quantize_causal_lm_model.py` example then attempted `save_pretrained()`. I see the help wanted tag, what kind of help is needed? Maybe I can chip in.