llama.cpp
llama.cpp copied to clipboard
Request Support for Mistral-8x22B
Feature Description
Support for Mixtral-8x22B
Mistral AI has just opened up a large model, Mistral 8x22B, with magnetic links again, with a model file size of 281.24 GB.
According to the name of the model, Mistral 8x22B is the Super Bowl version of "mixtral-8x7b", which was opened up last year, and the parameter size has more than tripled-it is made up of eight expert networks with 22 billion parameters (8 x 22B).
magnet:?xt=urn:btih:9238b09245d0d8cd915be09927769d5f7584c1c9&dn=mixtral-8x22b&tr=udp%3A%2F%http://2Fopen.demonii.com%3A1337%2Fannounce&tr=http%3A%2F%http://2Ftracker.opentrackr.org%3A1337%2Fannounce
Motivation
It should be a good model.
+1
It is not a Mistral Medium, it's a new model. Mistral Medium has different context length, etc. and Mistral Medium was leaked earlier. They said it's a brand new model.
Did someone download the torrent ? Is it an HF model with modeling code or only weights inside without the architecture ?
It is not a Mistral Medium, it's a new model. Mistral Medium has different context length, etc. and Mistral Medium was leaked earlier. They said it's a brand new model.
Okay, I'll change the title.
@phymbert
Don't know if usefull but it's already up on huggingface. https://huggingface.co/mistral-community/Mixtral-8x22B-v0.1
(You'll find many uploads).
Don't know if usefull but it's already up on huggingface. https://huggingface.co/mistral-community/Mixtral-8x22B-v0.1
It is useful, thanks, I did not notice they changed the org. Let's go then
It just works. =D
https://huggingface.co/MaziyarPanahi/Mixtral-8x22B-v0.1-GGUF/tree/main
Confirmed the IQ3_XS runs without changes.
Is it really the exact same architecture though? Perhaps there are some subtle optimizations.
It looks so, just bigger: https://huggingface.co/mistral-community/Mixtral-8x22B-v0.1/blob/main/config.json https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1/blob/main/config.json
Unfortunately, convert fails with Mixtral 8x22b instruct:
ValueError: Vocab size mismatch (model has 32768, but Mixtral-8x22B-Instruct-v0.1/tokenizer.json has 32769).
This off-by-little (sometimes 1, sometimes a few more) is actually a very common problem with older models that I quantize, but because they are older, I haven't bothered reporting it yet.
#6740
Unfortunately, convert fails with Mixtral 8x22b instruct:
ValueError: Vocab size mismatch (model has 32768, but Mixtral-8x22B-Instruct-v0.1/tokenizer.json has 32769).
This off-by-little (sometimes 1, sometimes a few more) is actually a very common problem with older models that I quantize, but because they are older, I haven't bothered reporting it yet.
That is because of a bug in the original mistral ai upload. Open the file tokenizer.json and change "TOOL_RESULT" into "TOOL_RESULTS" and the conversion should work.
https://huggingface.co/mistralai/Mixtral-8x22B-Instruct-v0.1/discussions/6
@tholin: indeed, thanks a lot!
@tholin: while convert.py succeeds, it results in a 11GB output file, so something still doesn't work. (b2699)
Update: no longer happens with b2715
This issue was closed because it has been inactive for 14 days since being marked as stale.