huggingface_hub Disable cache on Inference APIs

Hi, How can I get a non-cached generation on Inference APIs through the Python client? I can disable it through CURL but I don't see an option with Python. Thanks!

Mar 02 '24 02:03 fakerybakery

Hi @fakerybakery, unfortunately this is not supported yet in the InferenceClient. What would be your use for it? Asking in case we can mitigate your problem right now without waiting for an update of huggingface_hub.

Otherwise we could add support for a new parameter when instantiating the client like this: InferenceClient(...., use_cache: bool = True)

Mar 04 '24 15:03 Wauplin

Sure, I'm working on a tool that uses Inference APIs and I want to allow users to regenerate in case they're not satisfied with the response

Mar 04 '24 16:03 fakerybakery

I've added the https://github.com/huggingface/huggingface_hub/labels/enhancement label then. If you are using only the text-generation task, there is a seed parameter that you can manually set to a random value on each new call. This way you ensure to always ditch the cache. Might be available to some other tasks as well (text-to-image, image-to-image?) but not all of them.

Mar 04 '24 17:03 Wauplin

Perfect, thanks!

Mar 04 '24 19:03 fakerybakery

@fakerybakery cache can be disabled by passing "X-Use-Cache: false" as header. This can be done like this:

from huggingface_hub import InferenceClient

client = InferenceClient(headers={"X-use-cache": "false"})

Hope this proves useful to you :) I'm closing this issue but let me know if you have any other question.

Jun 11 '24 14:06 Wauplin