distilabel
distilabel copied to clipboard
[FEATURE] Add support for `HF Nvidia NIM API` on `InferenceEndpointsLLM`
Is your feature request related to a problem? Please describe. Add support for serverless Nvidia NIM API.
Describe the solution you'd like As suggested, it will require the following:
The new NIM API requires a specific base_url to be passed:
client = InferenceClient(
base_url="https://huggingface.co/api/integrations/dgx/v1",
api_key="MY_FINEGRAINED_TOKEN"
)
And then it requires a model id to be passed in the model argument of chat_completions:
output = client.chat.completions.create(
model="meta-llama/Meta-Llama-3-8B-Instruct",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Count to 10"},
],
stream=True,
max_tokens=1024,
)
Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.
Additional context PR for reference: https://github.com/huggingface/huggingface_hub/pull/2482