text-generation-inference icon indicating copy to clipboard operation
text-generation-inference copied to clipboard

[Feature] Support for Hugging Face Inference Endpoints custom handlers?

Open anttttti opened this issue 1 year ago • 8 comments

Feature request

text-generation-inference currently doesn't support custom handler.py files, as used by Hugging Face Inference Endpoints. So selecting "Text Generation Inference" as the container type means your requirements.txt and handler.py files get ignored.

Could the handler.py be used, even to a limited degree? The handler.py file is where you can conveniently place prompt engineering and postprocessing, without a separate service for these.

Motivation

Text-generation-inference isn't fully compatible with HFIE: important pre and post-processing functionality from HFIE is not supported.

Your contribution

anttttti avatar Jul 09 '23 22:07 anttttti

@philschmid, would you like to take a look?

OlivierDehaene avatar Jul 10 '23 09:07 OlivierDehaene

@anttttti could you share some context on why you would like to use a custom handler.py with TGI?

philschmid avatar Jul 13 '23 12:07 philschmid

@anttttti could you share some context on why you would like to use a custom handler.py with TGI?

If TGI doesn't provide custom pre&postprocessing with handler.py, then for many applications you'll need a second service for these, with the same autoscaling and throughput. With HFIE, you'd need to build this as an external service between HFIE and the requests.

anttttti avatar Jul 14 '23 19:07 anttttti

@anttttti to understand thats not about customizing the ML inference process or enabling new pipelines it would more about adding "auth" checks or some other business logic. That sounds like it should not be part of the solution. What you could do if you want to have it in 1 container is create a customer container which runs 2 processes. Or deploy 2 endpoints where 1 acts as a proxy.

philschmid avatar Jul 15 '23 07:07 philschmid

@anttttti to understand thats not about customizing the ML inference process

The use case I have in mind is customizing the inference process, but there must be a lot of ways the handler.py files are used. If handler.py could be used with the text-generation-inference container in HFIE, that would avoid maintaining a second service or a custom container instead of text-generation-inference.

anttttti avatar Jul 16 '23 09:07 anttttti

Hi @anttttti

It's really hard to create a handler.py within TGI itself. The reason is that the current code is a tight loop highly catered for performance. And as a first, the router a pretty sizeable component is written in Rust, not in Python.

handler.py is aimed at being super general so you can do whatever you want, and very pythonic. So having a small handler.py on top, or as a proxy makes sense, having some handle deep into the code really doesn't make any (especially since the layout of all tensors might change depending on the model and the configuration).

Narsil avatar Jul 16 '23 14:07 Narsil

Hi @anttttti

It's really hard to create a handler.py within TGI itself. The reason is that the current code is a tight loop highly catered for performance. And as a first, the router a pretty sizeable component is written in Rust, not in Python.

handler.py is aimed at being super general so you can do whatever you want, and very pythonic. So having a small handler.py on top, or as a proxy makes sense, having some handle deep into the code really doesn't make any (especially since the layout of all tensors might change depending on the model and the configuration).

Thanks for the explanation. This does limit what you can do with HFIE using text-generation-inference. There are also more elaborate frameworks for controlling generation coming out, where a more complete solution is required: https://github.com/microsoft/guidance/issues/33 .

anttttti avatar Jul 16 '23 19:07 anttttti

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions[bot] avatar May 03 '24 01:05 github-actions[bot]