llama-cookbook
llama-cookbook copied to clipboard
add support for llama vision model conversion
What does this PR do?
Updated the script to support converting finetuned llama 3.2 vision model to HF format, so it works with multimodal inference.
Feature/Issue validation/testing
Tested following scripts and it works, without the fix it gives conversion error between llama and mllama config.
python src/llama_recipes/inference/checkpoint_converter_fsdp_hf.py --fsdp_checkpoint_path /path/to/finetuned/model --consolidated_model_path /path/to/save/converted/model --HF_model_path_or_name /home/ubuntu/llama/Llama-3.2-11B-Vision-Instruct/ --multimodal True
python recipes/quickstart/inference/local_inference/multi_modal_infer.py --image_path /home/ubuntu/chair.jpg --prompt_text "Describe this image" --temperature 0.5 --top_p 0.8 --model_name finetuned_model_mind2web/fine-tuned-meta-llama/hf_model/ --hf_token HF_TOKEN
Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
- [x] Did you read the contributor guideline, Pull Request section?
- [ ] Was this discussed/approved via a Github issue? Please add a link to it if that's the case.
- [x] Did you make sure to update the documentation with your changes?
- [ ] Did you write any new necessary tests?
Thanks for contributing 🎉!
Hi @init27 @tryrobbo , could you help take a look and merge this? I'd like to demo the official repo in a workshop instead of my fork. Thanks!
@roywei Thanks for your PR on this important feature. I will run some tests on your branch now to double check.