gpt4all icon indicating copy to clipboard operation
gpt4all copied to clipboard

Add support for Mixtral 8x7B

Open flowstate247 opened this issue 1 year ago • 17 comments

Feature request

Add support for Mixtral 8x7B: https://mistral.ai/news/mixtral-of-experts/

Motivation

Mixtral 8x7B is a high-quality sparse mixture of experts model (SMoE) with open weights. Licensed under Apache 2.0. Mixtral outperforms Llama 2 70B on most benchmarks with 6x faster inference. It is the strongest open-weight model with a permissive license and the best model overall regarding cost/performance trade-offs. In particular, it matches or outperforms GPT3.5 on most standard benchmarks.

Your contribution

.

flowstate247 avatar Dec 12 '23 00:12 flowstate247

afaik the mixtral branch of llama.cpp just updated to be able to use the new 8x7b model.

So we just have to hope for a quick llama.cpp update here i guess

RandomLegend avatar Dec 12 '23 11:12 RandomLegend

Just an opinion, people will then ask to support SOLAR, then X then Y...etc. I think it's time to expend the architecture to support any future model which an expected architecture/format, starting by what's available today (GPTQ, GUFF...etc.), you'll then need to just provide the huggingface model ID or something...

bitsnaps avatar Dec 31 '23 13:12 bitsnaps

Just an opinion, people will then ask to support SOLAR, then X then Y...etc. I think it's time to expend the architecture to support any future model which an expected architecture/format, starting by what's available today (GPTQ, GUFF...etc.), you'll then need to just provide the huggingface model ID or something...

Actually, SOLAR already works in GPT4All 2.5.4. Some other models don't, that's true (e.g. phi-2).

GPT4All is built on top of llama.cpp, so it is limited with what llama.cpp can work with. The list grows with time, and apparently 2.6.0 should be able to work with more architectures.

brankoradovanovic-mcom avatar Jan 08 '24 16:01 brankoradovanovic-mcom

Still Mixtral 8x7B does not work with version v2.6.1, am I doing something wrong or is it not supported yet? As I understand it, the v2.6.1 update is for the 23rd November version of llama.cpp and Mixtral will be supported with the December version?

maninthemiddle01 avatar Jan 13 '24 16:01 maninthemiddle01

Still Mixtral 8x7B does not work with version v2.6.1, am I doing something wrong or is it not supported yet? As I understand it, the v2.6.1 update is for the 23rd November version of llama.cpp and Mixtral will be supported with the December version?

Also no success with TheBloke's GGUF versions so far. trying out different versions now.

I just get a generic error message in the client. Can anyone tell me how to get more detailed error messages or are there some log files I missed?

J35ter avatar Jan 14 '24 11:01 J35ter

Just an opinion, people will then ask to support SOLAR, then X then Y...etc. I think it's time to expend the architecture to support any future model which an expected architecture/format, starting by what's available today (GPTQ, GUFF...etc.), you'll then need to just provide the huggingface model ID or something...

All true, but the moment to make a more flexible architecture is suboptimal if you time it with the introduction of a milestone feature. the only thing you get like that is a damage to the userbase. shoehorn in the feature, then refactor with the experience.

FlorianHeigl avatar Jan 14 '24 14:01 FlorianHeigl

Also no success with TheBloke's GGUF versions so far. trying out different versions now.

I just get a generic error message in the client. Can anyone tell me how to get more detailed error messages or are there some log files I missed?

If you can use the Python API, you'll get more detailed error messages. I had a situation where the chat crashed immediately upon loading the model (without even displaying the generic message), but when I tried to load the model using the Python API, I got a proper error message. It didn't help me much, though - for us the end users, any sort of error while loading the model means "this model doesn't work in GPT4All, move on". :-)

brankoradovanovic-mcom avatar Jan 15 '24 08:01 brankoradovanovic-mcom

Unfortunately I don't know how to use the Python API, if I start the program from shell I get these error messages:

error loading model: create_tensor: tensor 'blk.0.ffn_gate.weight' not found llama_load_model_from_file_gpt4all: failed to load model LLAMA ERROR: failed to load model from /home/user/gpt4all/models/mixtral-8x7b-instruct-v0.1.Q3_K_M.gguf [Warning] (Mon Jan 15 xx:xx:xx 2024): ERROR: Could not load model due to invalid model file for mixtral-8x7b-instruct-v0.1.Q3_K_M.gguf id "3f67fb46-0c3b-4f89-8e6b-8d9747a4aaca"

Does this help?

maninthemiddle01 avatar Jan 15 '24 09:01 maninthemiddle01

Still Mixtral 8x7B does not work with version v2.6.1, am I doing something wrong or is it not supported yet? As I understand it, the v2.6.1 update is for the 23rd November version of llama.cpp and Mixtral will be supported with the December version?

Support will be added in #1819. (edit: actually, probably not completely - I think we have to implement more ops in the Vulkan backend. CPU inference should work though. Somebody should actually test it, maybe myself if I get a chance.)

cebtenzzre avatar Jan 16 '24 16:01 cebtenzzre

So, #1819 has been merged and it landed in 2.6.2, but yesterday I've tried phi-2 (Q5_K_M, specifically) but it still doesn't work. I suppose it doesn't have upstream support yet.

It would be nice if 2.6.2 release notes explicitly listed at least some notable models that didn't work in 2.6.1 but work now (instead of leaving most users guessing). In particular, I was surprised release notes did not mention Mixtral 8x7B, which I interpreted as "doesn't work just yet". :-)

brankoradovanovic-mcom avatar Feb 02 '24 08:02 brankoradovanovic-mcom

for me it works. I just don't have enough ram. Will change next week :-)

woheller69 avatar Feb 02 '24 11:02 woheller69

So, #1819 has been merged and it landed in 2.6.2, but yesterday I've tried phi-2 (Q5_K_M, specifically) but it still doesn't work. I suppose it doesn't have upstream support yet.

It would be nice if 2.6.2 release notes explicitly listed at least some notable models that didn't work in 2.6.1 but work now (instead of leaving most users guessing). In particular, I was surprised release notes did not mention Mixtral 8x7B, which I interpreted as "doesn't work just yet". :-)

The complete answer is that we neglected to add the new models to the whitelist - fix incoming. Mixtral works (on CPU) because it claims to simply be "llama", which we support.

edit: See #1914

cebtenzzre avatar Feb 02 '24 20:02 cebtenzzre

Mixtral 8x7B indeed works in the chat, but it doesn't work with Python bindings - I guess that's one last bit missing for full support.

brankoradovanovic-mcom avatar Feb 05 '24 08:02 brankoradovanovic-mcom

Mixtral 8x7B indeed works in the chat, but it doesn't work with Python bindings - I guess that's one last bit missing for full support.

Working on it: #1931

cebtenzzre avatar Feb 05 '24 22:02 cebtenzzre

I just released version 2.2.0 of the python bindings, which has support for all of the latest models mentioned in #1914, including Mixtral and Phi-2 (CPU only).

cebtenzzre avatar Feb 06 '24 16:02 cebtenzzre

Hope this model can work with GPT4ALL

alexandre-leng avatar Mar 13 '24 13:03 alexandre-leng

Hope this model can work with GPT4ALL

It does. Just no Windows/Linux GPU support yet, which is the only reason this issue is still open.

cebtenzzre avatar Mar 14 '24 01:03 cebtenzzre