distilabel
distilabel copied to clipboard
[FEATURE] Add `options` within `generation_kwargs` for `InferenceEndpointsLLM`
Is your feature request related to a problem? Please describe.
Inference Endpoints will by default use the cache unless explicitly specified otherwise, so we should add a flag to control that and disable it, as in the cases where we use num_generations
is discouraged and makes no sense to use the cache, since all the generations will be equal.
Describe the solution you'd like
Align the existing generation kwargs in distilabel
for InferenceEndpointsLLM
with the ones offered by the huggingface_hub.InferenceClient
for Inference Endpoints.
Additional context
See their docs and {"options": {"use_cache": false}}
at https://huggingface.co/docs/api-inference/detailed_parameters#text-generation-task