llama-cookbook icon indicating copy to clipboard operation
llama-cookbook copied to clipboard

add support for llama vision model conversion

Open roywei opened this issue 1 year ago • 2 comments

What does this PR do?

Updated the script to support converting finetuned llama 3.2 vision model to HF format, so it works with multimodal inference.

Feature/Issue validation/testing

Tested following scripts and it works, without the fix it gives conversion error between llama and mllama config.

python src/llama_recipes/inference/checkpoint_converter_fsdp_hf.py --fsdp_checkpoint_path  /path/to/finetuned/model --consolidated_model_path  /path/to/save/converted/model  --HF_model_path_or_name /home/ubuntu/llama/Llama-3.2-11B-Vision-Instruct/ --multimodal True

python recipes/quickstart/inference/local_inference/multi_modal_infer.py --image_path  /home/ubuntu/chair.jpg --prompt_text "Describe this image" --temperature 0.5 --top_p 0.8 --model_name finetuned_model_mind2web/fine-tuned-meta-llama/hf_model/ --hf_token HF_TOKEN

Before submitting

  • [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • [x] Did you read the contributor guideline, Pull Request section?
  • [ ] Was this discussed/approved via a Github issue? Please add a link to it if that's the case.
  • [x] Did you make sure to update the documentation with your changes?
  • [ ] Did you write any new necessary tests?

Thanks for contributing 🎉!

roywei avatar Oct 06 '24 05:10 roywei

Hi @init27 @tryrobbo , could you help take a look and merge this? I'd like to demo the official repo in a workshop instead of my fork. Thanks!

roywei avatar Oct 14 '24 16:10 roywei

@roywei Thanks for your PR on this important feature. I will run some tests on your branch now to double check.

wukaixingxp avatar Oct 14 '24 16:10 wukaixingxp