huggingface_hub icon indicating copy to clipboard operation
huggingface_hub copied to clipboard

Disable cache on Inference APIs

Open fakerybakery opened this issue 2 years ago • 4 comments

Hi, How can I get a non-cached generation on Inference APIs through the Python client? I can disable it through CURL but I don't see an option with Python. Thanks!

fakerybakery avatar Mar 02 '24 02:03 fakerybakery

Hi @fakerybakery, unfortunately this is not supported yet in the InferenceClient. What would be your use for it? Asking in case we can mitigate your problem right now without waiting for an update of huggingface_hub.

Otherwise we could add support for a new parameter when instantiating the client like this: InferenceClient(...., use_cache: bool = True)

Wauplin avatar Mar 04 '24 15:03 Wauplin

Sure, I'm working on a tool that uses Inference APIs and I want to allow users to regenerate in case they're not satisfied with the response

fakerybakery avatar Mar 04 '24 16:03 fakerybakery

I've added the https://github.com/huggingface/huggingface_hub/labels/enhancement label then. If you are using only the text-generation task, there is a seed parameter that you can manually set to a random value on each new call. This way you ensure to always ditch the cache. Might be available to some other tasks as well (text-to-image, image-to-image?) but not all of them.

Wauplin avatar Mar 04 '24 17:03 Wauplin

Perfect, thanks!

fakerybakery avatar Mar 04 '24 19:03 fakerybakery

@fakerybakery cache can be disabled by passing "X-Use-Cache: false" as header. This can be done like this:

from huggingface_hub import InferenceClient

client = InferenceClient(headers={"X-use-cache": "false"})

Hope this proves useful to you :) I'm closing this issue but let me know if you have any other question.

Wauplin avatar Jun 11 '24 14:06 Wauplin