lorax issues

Want Lorax with newer version of TGI

5

### Feature request hello，our models are deploying with TGI(v1.4.3), and we alse want to use lorax. But I find that the tgi version lorax is based on is very different...

yangelaboy

question

Fuse q,k,v LoRAs

4

Currently, we treat each of the Q, K, V LoRAs as distinct tensors, meaning we do 3 SGMV calls per layer instead of 1. We should fuse them to improve...

tgaddair

enhancement

Add support for LoReFT

1

[Repo](https://github.com/stanfordnlp/pyreft) [Paper](https://arxiv.org/abs/2404.03592)

tgaddair

enhancement

Supporting inference with EETQ quantized model

### Feature request EETQ quantized model perform with very good quality in my case, but the loading is pretty slow. So that if the base model is quantized with EETQ...

thincal

enhancement

Improve error handling in SGMV kernels

4

Any failure in SGMV comes back as `Request failed during generation: Server error: No suitable kernel. dtype=Half` From Discord: > I have tried the finetune adapter for llama2-7b. I trained...

tgaddair

enhancement

Batch prefill tokens uses max input tokens as default

1

`max_batch_prefill_tokens` is now optional, and it is defaulting to the value from `max_input_length`. Now, users can define custom `max_input_length` without having to also specify `max_batch_prefill_tokens`

noah-yoshida

tgaddair

bug

lorax
lorax copied to clipboard

Metadata

Add support for fp8 (H100)

Want Lorax with newer version of TGI

Fuse q,k,v LoRAs

Add support for LoReFT

Supporting inference with EETQ quantized model

Improve error handling in SGMV kernels

Batch prefill tokens uses max input tokens as default

(WIP) snowflake endpoint POC

Add support for AQLM quantization

Fix Mixtral usage with `--compile`

← Metadata

Owner

Metadata

lorax lorax copied to clipboard

Metadata

← Metadata

Owner

Metadata

lorax
lorax copied to clipboard