Disable cache on Inference APIs
Hi, How can I get a non-cached generation on Inference APIs through the Python client? I can disable it through CURL but I don't see an option with Python. Thanks!
Hi @fakerybakery, unfortunately this is not supported yet in the InferenceClient. What would be your use for it? Asking in case we can mitigate your problem right now without waiting for an update of huggingface_hub.
Otherwise we could add support for a new parameter when instantiating the client like this: InferenceClient(...., use_cache: bool = True)
Sure, I'm working on a tool that uses Inference APIs and I want to allow users to regenerate in case they're not satisfied with the response
I've added the https://github.com/huggingface/huggingface_hub/labels/enhancement label then. If you are using only the text-generation task, there is a seed parameter that you can manually set to a random value on each new call. This way you ensure to always ditch the cache. Might be available to some other tasks as well (text-to-image, image-to-image?) but not all of them.
Perfect, thanks!
@fakerybakery cache can be disabled by passing "X-Use-Cache: false" as header. This can be done like this:
from huggingface_hub import InferenceClient
client = InferenceClient(headers={"X-use-cache": "false"})
Hope this proves useful to you :) I'm closing this issue but let me know if you have any other question.