bumblebee
bumblebee copied to clipboard
Support Mixtral
It seems that bumblebee is not capable of loading Mixtral-8x7B models (base or instruct). I've checked the files and it should be able to load the model (in theory) since it is capable of loading Mistral-7b files, but I keep getting
** (ArgumentError) could not match the class name "MixtralForCausalLM" to any of the supported models, please specify the :module and :architecture options
(bumblebee 0.5.3) lib/bumblebee.ex:409: Bumblebee.do_load_spec/4
(bumblebee 0.5.3) lib/bumblebee.ex:578: Bumblebee.maybe_load_model_spec/3
(bumblebee 0.5.3) lib/bumblebee.ex:566: Bumblebee.load_model/2
#cell:4lsfbdujnkoinq5p:3: (file)
Both configuration files have the same reference to MixtralForCausalLM and Mixtral-8x7B has the safetensors filers.
Mix.install(
[
{:bumblebee, "~> 0.5.3"},
{:exla, ">= 0.0.0"}
],
config: [nx: [default_backend: EXLA.Backend]]
)
repo = {:hf, "mistralai/Mixtral-8x7B-Instruct-v0.1"}
{:ok, model_info} = Bumblebee.load_model(repo, type: :bf16, backend: EXLA.Backend) # errors
{:ok, tokenizer} = Bumblebee.load_tokenizer(repo)
{:ok, generation_config} = Bumblebee.load_generation_config(repo)
@WebCloud Bumblebee needs implementation for each model type in order to load it. Mixtral is not implemented currently, while Mistral is.
I see! Well, I am quite fresh on Elixir, but I'd be happy to help however I can.
For an example of adding a model you can see Mistral #264. The corresponding hf/transformers code is modeling_mixtral.py.
However, note that Nx/EXLA doesn't support quantization yet, and the bf16 model is around 100GB, so it is not very practical for running on the GPU at this point.
Cool! thanks for the resources!