Eric Buehler

Results 136 issues of Eric Buehler

The [Marlin INT4xFP16](https://github.com/IST-DASLab/marlin) CUDA matmul kernel can achieve ~4x speed improvement over CUTLASS matmul. See also: [hqq](https://github.com/mobiusml/hqq/) as a quantization method which supports Marlin and other optimized kernels, without calibration...

new feature
optimization

This PR implements our first Seq2Seq model, T5. Refs #384.

Currently, if a sentencepiece `.model` file is provided, the user must run a provided script to convert into the equivalent `tokenizer.json`. By supporting `sentencepiece` models directly, we can avoid this...

new feature

This will implement memory usage tracking. This will be used for #377. - [x] CPU - [ ] CUDA: https://docs.rs/cudarc/latest/cudarc/driver/result/fn.mem_get_info.html - [ ] Metal

This PR enables storing and then restoring the model-specific prefix cache on disk. The intended use case, paired with #350, is to accelerate few-shot learning use cases by allowing a...

optimization

Refs #347.

new feature
optimization

Hello all, Thanks for your great work here! When I run using `cudarc`, I get the error: ``` called `Result::unwrap()` on an `Err` value: Cuda(Cuda(DriverError(CUDA_ERROR_NO_DEVICE, "no CUDA-capable device is detected")))...

Hello and thank you for the great work here! We are trying to save a Phi 3 vision mode, but are running into some issues saving it as safetensors. Due...

Hi all, Since the merge of #1491, documentation has yet to be added to X-LoRA. I was wondering how I should approach adding this? Additionally, I also wanted to discuss...