Jonatan Kłosko
Jonatan Kłosko
For the reference, [here](https://huggingface.co/docs/transformers/main/en/quantization/overview) is a whole table of different quantization libraries/techniques/formats that hf/transformers support. Axon implements a specific quantization method. I believe the idea is that we could use...
This probably belongs more to Axon than Bumblebee, since we need a way to store `%Axon.ModelState{}`. For the model itself, maybe there should be a way to quantize the model...
Oh, I missed `quantize_model`! For the model state you can actually do `Nx.serialize(model_state)`. So it would be this: ```elixir # Serialize File.write!("state.nx", Nx.serialize(model_info.params)) # Load {:ok, spec} = Bumblebee.load_spec({:hf, "..."})...
@tubedude unfortunately it doesn't fit into the usual logits processing approach. We generate the transcription token-by-token, and logits processing applies some transformation to logits at each iteration. My understanding is...
@kevinschweikert thanks for the PR! Dropping the ffmpeg dependency would be great, but yeah, I agree that we need xav to be precompiled for this to be beneficial.
Sounds good to me!
I've just realised that it's not just about precompilation, the main blocker is that `xav` still requires ffmpeg to be installed, so at the moment there is no benefit really...
@josevalim we already have an API for changing the editor intellisense node as of #390! Extending the field to accept variable sounds good to me. We probably should make it...
Just a quick note that one way we could track all EXLA buffers would be to have a static global list of pointers. Whenever an EXLA buffer is created we...
> the list may grow long and deleting becomes expensive Actually, if we store the iterator of the inserted list element inside the EXLA buffer, we should be able to...