mistral.rs icon indicating copy to clipboard operation
mistral.rs copied to clipboard

Add LLaVA Support

Open chenwanqq opened this issue 7 months ago • 6 comments

Introduction

This implementation is based on my work for candle. However, it incorporates some notable differences:

  • I have completely removed support for the model format used in the liuhaotian/llava (original) repo. Instead, I adopted the model format from the llava-hf repo, providing a more standardized approach to utilizing the model, particularly the tokenizer.
  • Similar to the Python transformer library, I have segmented the original project into LLaVA (1.5) and LLaVANext (1.6). This reduction in the use of if-else statements simplifies the code.
  • I have revamped the process of concatenating text and image features. The updated method aligns closely with phi3v's implementation in this repo, ensuring a more uniform coding style.

Still Working!

Some Notes

  • The fused RoPE might cause issues, possibly due to precision-related matters. Please refer to this https://github.com/EricLBuehler/mistral.rs/issues/465. Therefore, I have implemented an alternative version in the llava-llm folder.
  • Certain modifications regarding cross-GPU mapping support [link] might result in significant memory usage problems.

Status

  • [x] Implement model
    • [x] LLaVANext (1.6)
    • [x] LLaVA 1.5
      • [x] Input processor
      • [x] Model structure
  • [x] Support for different LLM Backends
    • [x] LLaMA-like models (vicuna, Nous-Hermes-2-Yi-34B)
    • [x] Mistral
  • [x] Cleanup of some debug code
  • [x] Documentation
  • [x] Examples

chenwanqq avatar Jun 27 '24 09:06 chenwanqq