mistral.rs icon indicating copy to clipboard operation
mistral.rs copied to clipboard

Model Wishlist

Open EricLBuehler opened this issue 2 years ago • 151 comments

Please let us know what model architectures you would like to be added!

Up to date todo list below. Please feel free to contribute any model, a PR without device mapping, ISQ, etc. will still be merged!

Language models

  • [ ] snowflake-arctic-instruct: Snowflake/snowflake-arctic-instruct
  • [ ] WizardLM-2: alpindale/WizardLM-2-8x22B
  • [ ] Command R: CohereForAI/c4ai-command-r-v01
  • [ ] Command R+: CohereForAI/c4ai-command-r-plus

Multimodal models

  • [x] Llava: liuhaotian/llava-v1.5-7b
  • [x] Idefics2: HuggingFaceM4/idefics2-8b
  • [ ] deepseek-vl: deepseek-ai/deepseek-vl-7b-chat
  • [ ] MiniCPM: openbmb/MiniCPM-V-2
  • [x] Phi 3 Vision: https://huggingface.co/microsoft/Phi-3-vision-128k-instruct

Embedding models

  • [ ] T5: google-t5/t5-base
  • [ ] nomic-text-embed: nomic-ai/nomic-embed-text-v1

EricLBuehler avatar Apr 16 '24 13:04 EricLBuehler

qwen1.5-72B-Chat

NiuBlibing avatar Apr 23 '24 03:04 NiuBlibing

llama3

NiuBlibing avatar Apr 23 '24 03:04 NiuBlibing

@NiuBlibing, we have llama3 support ready: the README has a few examples. I will add Qwen support shortly.

EricLBuehler avatar Apr 23 '24 21:04 EricLBuehler

@NiuBlibing, I just added Qwen2 support. Quantized Qwen2 support will be added in the next few days.

EricLBuehler avatar Apr 25 '24 23:04 EricLBuehler

Can you add https://huggingface.co/Snowflake/snowflake-arctic-instruct?

cargecla1 avatar Apr 26 '24 11:04 cargecla1

Hello! Any plans for adding multimodal (e.g. llava) and embedding models?

francis2tm avatar Apr 28 '24 21:04 francis2tm

Can you add https://huggingface.co/Snowflake/snowflake-arctic-instruct?

@cargecla1, yes! It will be a great use case for ISQ.

EricLBuehler avatar Apr 28 '24 21:04 EricLBuehler

Hello! Any plans for adding multimodal (e.g. llava) and embedding models?

@francis2tm, yes. I plan on supporting Llava and embedding models this week.

EricLBuehler avatar Apr 28 '24 21:04 EricLBuehler

@NiuBlibing, you can run Qwen now with ISQ, which will quantize it.

EricLBuehler avatar Apr 28 '24 21:04 EricLBuehler

Would be nice to support at least one strong vision-language model: https://huggingface.co/openbmb/MiniCPM-V-2 https://huggingface.co/OpenGVLab/InternVL-Chat-V1-5 with an option to compute visual frontend model on CPU. You might find it easier to ship visual transformer part via onnx.

kir-gadjello avatar Apr 29 '24 01:04 kir-gadjello

Would love to see some DeepSeek-VL, this model is better than Llava and spupports multiple images per prompt https://huggingface.co/collections/deepseek-ai/deepseek-vl-65f295948133d9cf92b706d3

chelbos avatar Apr 29 '24 02:04 chelbos

Also, outside the LLM world, would love to see support for https://github.com/cvg/LightGlue :) but not sure if that's possible ...

chelbos avatar Apr 29 '24 03:04 chelbos

Could you add support to for GGUF quantized Phi-3-Mini to the wishlist? Currently, this fails (built from master):

Running `./mistralrs-server gguf -m PrunaAI/Phi-3-mini-128k-instruct-GGUF-Imatrix-smashed -t microsoft/Phi-3-mini-128k-instruct -f /home/jett/Downloads/llms/Phi-3-mini-128k-instruct-q3_K_S.gguf`
2024-04-29T03:08:35.180939Z  INFO mistralrs_server: avx: true, neon: false, simd128: false, f16c: false
2024-04-29T03:08:35.180975Z  INFO mistralrs_server: Sampling method: penalties -> temperature -> topk -> topp -> multinomial
2024-04-29T03:08:35.180982Z  INFO mistralrs_server: Loading model `microsoft/Phi-3-mini-128k-instruct` on Cpu...
2024-04-29T03:08:35.180989Z  INFO mistralrs_server: Model kind is: quantized from gguf (no adapters)
2024-04-29T03:08:35.181017Z  INFO hf_hub: Token file not found "/home/jett/.cache/huggingface/token"    
2024-04-29T03:08:35.181048Z  INFO mistralrs_core::utils::tokens: Could not load token at "/home/jett/.cache/huggingface/token", using no HF token.
2024-04-29T03:08:35.181122Z  INFO hf_hub: Token file not found "/home/jett/.cache/huggingface/token"    
2024-04-29T03:08:35.181133Z  INFO mistralrs_core::utils::tokens: Could not load token at "/home/jett/.cache/huggingface/token", using no HF token.
Error: Unknown GGUF architecture `phi3`

jett06 avatar Apr 29 '24 03:04 jett06

It'll be great to see WizardLM-2 and suzume. And thanks for a great tool!

rodion-m avatar Apr 29 '24 06:04 rodion-m

Command-R and Command-R+ from Cohere would be amazing 🙏

W4G1 avatar Apr 29 '24 11:04 W4G1

T5 LLAVA

yongkangzhao avatar Apr 29 '24 17:04 yongkangzhao

@kir-gadjello

Would be nice to support at least one strong vision-language model: https://huggingface.co/openbmb/MiniCPM-V-2 https://huggingface.co/OpenGVLab/InternVL-Chat-V1-5 with an option to compute visual frontend model on CPU. You might find it easier to ship visual transformer part via onnx.

Supporting a vision+language or multimodal model is very high priority right now.


@chelbos

Would love to see some DeepSeek-VL, this model is better than Llava and spupports multiple images per prompt https://huggingface.co/collections/deepseek-ai/deepseek-vl-65f295948133d9cf92b706d3

I'll add this one too.

Also, outside the LLM world, would love to see support for https://github.com/cvg/LightGlue :) but not sure if that's possible ...

I will look into it!


@jett06

Could you add support to for GGUF quantized Phi-3-Mini to the wishlist?

Yes, absolutely, I think it should be easy. In the meantime, you can use ISQ to get the same speed.


@rodion-m

It'll be great to see WizardLM-2 and suzume. And thanks for a great tool!

Thanks! I think suzume is just finetuned Llama so that can be used already. I'll add WizardLM.


@W4G1

Command-R and Command-R+ from Cohere would be amazing 🙏

Yes, I'll add those.


@yongkangzhao

T5 and LLaVA

Yes, I'll add those. T5 will be a nice smaller model.

EricLBuehler avatar Apr 29 '24 17:04 EricLBuehler

@EricLBuehler Thanks for your reply, for adding my suggestion to the model wishlist, and for developing such an awesome project! It's very appreciated :)

jett06 avatar Apr 29 '24 19:04 jett06

Congrats for your great work! +1 for vision models like Idefics2-8b or better would be awesome

ldt avatar Apr 30 '24 13:04 ldt

it would be nice to add some embedding models like nomic-text-embed.

maximus2600 avatar May 01 '24 03:05 maximus2600

Hello, first of all, I want to express my appreciation for the excellent work your team has accomplished on the mistral.rs engine. It's a great project.

I am currently developing a personal AI assistant using Rust, and I believe integrating additional features into your engine could significantly enhance its utility and appeal. Specifically, adding support for Whisper and incorporating Text-to-Speech (TTS) functionalities, such as StyleTTS or similar technologies, would be incredibly beneficial. This would enable the engine to handle LLM inference, speech-to-text, and text-to-speech processes in a unified system very fast (near runtime).

Implementing these features could transform the engine into a more versatile tool for developers like myself, who are keen on building more integrated and efficient AI applications.

progressionnetwork avatar May 04 '24 07:05 progressionnetwork

@jett06, I just added quantized GGUF Phi-3 support in #276! That is without LongRope support currently, but you can use a plain model with ISQ.

EricLBuehler avatar May 09 '24 15:05 EricLBuehler

@EricLBuehler Woah, thank you so much! This will be lovely for us folks with less powerful computers or size constraints, you're awesome :)

jett06 avatar May 09 '24 19:05 jett06

@jett06, my pleasure! I just fixed a small bug (in case you saw the strange behavior), so it should be all ready to go now!

EricLBuehler avatar May 09 '24 21:05 EricLBuehler

IBM's Granite series Code Models.

Granite Code Models

NeroHin avatar May 10 '24 01:05 NeroHin

@NeroHin

IBM's Granite series Code Models.

Granite Code Models

The 3b and 8b variants should already be supported as they are just based on the llama architecture.

The 20b and 34b variants are based on the GPTBigCode architecture which currently isn't implemented in mistral.rs.

LLukas22 avatar May 11 '24 16:05 LLukas22

Hello! Any plans for adding multimodal (e.g. llava) and embedding models?

I'm working on it now.chenwanqq/candle-llava It's not easy dude, tons of image preprocess and tensor concat.

chenwanqq avatar May 23 '24 08:05 chenwanqq

I'm working on it now.chenwanqq/candle-llava It's not easy dude, tons of image preprocess and tensor concat.

Yes! Would you be willing to contribute your implementation here once it's done?

EricLBuehler avatar May 23 '24 09:05 EricLBuehler

I'm working on it now.chenwanqq/candle-llava It's not easy dude, tons of image preprocess and tensor concat.

Yes! Would you be willing to contribute your implementation here once it's done?

Yes of course!

chenwanqq avatar May 23 '24 09:05 chenwanqq

Yes of course!

Great, looking forward to that! I'm working on Idefics 2 here: #309.

EricLBuehler avatar May 23 '24 09:05 EricLBuehler