Jonatan Kłosko

Results 256 comments of Jonatan Kłosko

For the reference, [here](https://huggingface.co/docs/transformers/main/en/quantization/overview) is a whole table of different quantization libraries/techniques/formats that hf/transformers support. Axon implements a specific quantization method. I believe the idea is that we could use...

This probably belongs more to Axon than Bumblebee, since we need a way to store `%Axon.ModelState{}`. For the model itself, maybe there should be a way to quantize the model...

Oh, I missed `quantize_model`! For the model state you can actually do `Nx.serialize(model_state)`. So it would be this: ```elixir # Serialize File.write!("state.nx", Nx.serialize(model_info.params)) # Load {:ok, spec} = Bumblebee.load_spec({:hf, "..."})...

@tubedude unfortunately it doesn't fit into the usual logits processing approach. We generate the transcription token-by-token, and logits processing applies some transformation to logits at each iteration. My understanding is...

@kevinschweikert thanks for the PR! Dropping the ffmpeg dependency would be great, but yeah, I agree that we need xav to be precompiled for this to be beneficial.

I've just realised that it's not just about precompilation, the main blocker is that `xav` still requires ffmpeg to be installed, so at the moment there is no benefit really...

@josevalim we already have an API for changing the editor intellisense node as of #390! Extending the field to accept variable sounds good to me. We probably should make it...

Just a quick note that one way we could track all EXLA buffers would be to have a static global list of pointers. Whenever an EXLA buffer is created we...

> the list may grow long and deleting becomes expensive Actually, if we store the iterator of the inserted list element inside the EXLA buffer, we should be able to...