mistral.rs
mistral.rs copied to clipboard
Add LLaVA Support
Introduction
This implementation is based on my work for candle. However, it incorporates some notable differences:
- I have completely removed support for the model format used in the liuhaotian/llava (original) repo. Instead, I adopted the model format from the llava-hf repo, providing a more standardized approach to utilizing the model, particularly the tokenizer.
- Similar to the Python transformer library, I have segmented the original project into LLaVA (1.5) and LLaVANext (1.6). This reduction in the use of if-else statements simplifies the code.
- I have revamped the process of concatenating text and image features. The updated method aligns closely with phi3v's implementation in this repo, ensuring a more uniform coding style.
Still Working!
Some Notes
- The fused RoPE might cause issues, possibly due to precision-related matters. Please refer to this https://github.com/EricLBuehler/mistral.rs/issues/465. Therefore, I have implemented an alternative version in the llava-llm folder.
- Certain modifications regarding cross-GPU mapping support [link] might result in significant memory usage problems.
Status
- [x] Implement model
- [x] LLaVANext (1.6)
- [x] LLaVA 1.5
- [x] Input processor
- [x] Model structure
- [x] Support for different LLM Backends
- [x] LLaMA-like models (vicuna, Nous-Hermes-2-Yi-34B)
- [x] Mistral
- [x] Cleanup of some debug code
- [x] Documentation
- [x] Examples