lorax Want Lorax with newer version of TGI

Feature request

hello，our models are deploying with TGI(v1.4.3), and we alse want to use lorax. But I find that the tgi version lorax is based on is very different with TGI version v1.4.3。 We are trying to integrate lorax(v0.8) into TGI(v1.4.3)。Is there possible to upgrade TGI of lorax or contribute lorax to TGI？

Motivation

use new features of TGI together with lorax

Your contribution

We are trying to integrate lorax(v0.8) into TGI(v1.4.3)， but both lorax and tgi are changing!

Mar 14 '24 07:03 yangelaboy

Hi @yangelaboy, thanks for trying out LoRAX. I'd love to incorporate more upstream work from TGI, but since they changed their license last year, we can no longer pull their code into our repo.

That said, we have implemented many of the same features recently (though in slightly different ways). Are there specific features you're using in TGI you want to see in LoRAX? If so, we can definitely prioritize getting those added.

One thing in TGI we're working to add very soon is speculative decoding. We think our implementation will be particularly interesting, as we'll be able to handle multiple speculation models at once. Let me know if there are other features you're interested in.

Mar 14 '24 15:03 tgaddair

@tgaddair Thinks for detailed replies. We are using features such as speculative decoding(ngram&medusa), quantization, also we're interested in much optimizations of TGI. We also added functions in TGI like shared prefix prompt cache。 Finally, We want a framework which can support different adapter models and medusa models in same self-trained model with a shared prefix prompt cache. I will pay attention to Lorax.

Mar 15 '24 09:03 yangelaboy

Hey @yangelaboy, thanks for this context! The good news is all of the things you listed are on our near-term roadmap.

Speculative decoding adapters per request - this is what I'm currently working on and hope to have out next week
Prefix caching - this is the next major item on the roadmap after speculative decoding, so hopefully a few weeks away at most
Quantization - we support a number of quantization options currently, but let me know if there are specifics ones we don't support that you would be interested in.

I'll definitely let you know when the speculative decoding is ready to test out!

Mar 15 '24 16:03 tgaddair

Thanks @tgaddair , we are also waiting for the Speculative decoding 👍

Mar 18 '24 14:03 abhibst

The license is back to Apache-2.0 https://github.com/huggingface/text-generation-inference/commit/ff42d33e9944832a19171967d2edd6c292bdb2d6 @tgaddair

Apr 09 '24 14:04 giyaseddin