mistral.rs Add LLaVA Support

Add LLaVA Support

Open chenwanqq opened this issue 7 months ago • 6 comments

This implementation is based on my work for candle. However, it incorporates some notable differences:

I have completely removed support for the model format used in the liuhaotian/llava (original) repo. Instead, I adopted the model format from the llava-hf repo, providing a more standardized approach to utilizing the model, particularly the tokenizer.
Similar to the Python transformer library, I have segmented the original project into LLaVA (1.5) and LLaVANext (1.6). This reduction in the use of if-else statements simplifies the code.
I have revamped the process of concatenating text and image features. The updated method aligns closely with phi3v's implementation in this repo, ensuring a more uniform coding style.

Still Working!

The fused RoPE might cause issues, possibly due to precision-related matters. Please refer to this https://github.com/EricLBuehler/mistral.rs/issues/465. Therefore, I have implemented an alternative version in the llava-llm folder.
Certain modifications regarding cross-GPU mapping support [link] might result in significant memory usage problems.

[x] Implement model
- [x] LLaVANext (1.6)
- [x] LLaVA 1.5
  - [x] Input processor
  - [x] Model structure
[x] Support for different LLM Backends
- [x] LLaMA-like models (vicuna, Nous-Hermes-2-Yi-34B)
- [x] Mistral
[x] Cleanup of some debug code
[x] Documentation
[x] Examples

Jun 27 '24 09:06 chenwanqq