Eric Buehler

Results 136 issues of Eric Buehler

Currently, `QTensor::quantize`: - Take a tensor, assume it is on the GPU for this example - Copies the data to the CPU - Quantizes on the CPU - Copies the...

@p-e-w, could you please give the implementation a quick check? I'm not sure if you are familiar with Rust, but I ported the algorithm from the oobabooga implemenation you linked....

new feature

Currently, our messages API is clunky as we need to support the older OpenAI format as well as the new, multimodal format (for Idefics and Llava). This is exposed in...

good first issue

With the recent advent of large models (take Llama 3.1 405b, for example!), distributed inference support is a must! We currently support naive device mapping, which works by allowing a...

new feature
backend

Currently, we apply all sampling: - Sequentially - On the CPU This is super slow. This PR is going to refactor the sampling system to do as much sampling work...

optimization

Refs #555. @KaQuMiQ I added some debug statements to get a better picture of what's going on. Can you please install from source: (assuming you have Rust installed, which I...