Eric Buehler issues

Results 136 issues of


                                            Eric Buehler

Add `QTensor::quantize_onto` to remove a redundant dtoh copy?

Currently, `QTensor::quantize`: - Take a tensor, assume it is on the GPU for this example - Copies the data to the CPU - Quantizes on the CPU - Copies the...

Add the Mamba 2 architecture

Implement DRY penalty

@p-e-w, could you please give the implementation a quick check? I'm not sure if you are familiar with Rust, but I ported the algorithm from the oobabooga implemenation you linked....

new feature

Remove plotly and just output CSV loss file

breaking

Refactor request messages (Rust) API

Currently, our messages API is clunky as we need to support the older OpenAI format as well as the new, multimodal format (for Idefics and Llava). This is exposed in...

good first issue

Distributed inference and tensor parallelism plans

With the recent advent of large models (take Llama 3.1 405b, for example!), distributed inference support is a must! We currently support naive device mapping, which works by allowing a...

new feature

backend

Eric Buehler

Add `QTensor::quantize_onto` to remove a redundant dtoh copy?

Add the Mamba 2 architecture

Implement DRY penalty

Remove plotly and just output CSV loss file

Refactor request messages (Rust) API

Distributed inference and tensor parallelism plans

Sampling on the GPU for as long as possible

Fix Metal python package running on CPU

Add display capabilities to python objects

Add profiling support for paged attn