distilabel [FEATURE] Add support for `HF Nvidia NIM API` on `InferenceEndpointsLLM`

[FEATURE] Add support for `HF Nvidia NIM API` on `InferenceEndpointsLLM`

Open plaguss opened this issue 5 months ago • 1 comments

Is your feature request related to a problem? Please describe. Add support for serverless Nvidia NIM API.

Describe the solution you'd like As suggested, it will require the following:

The new NIM API requires a specific base_url to be passed:

client = InferenceClient(
    base_url="https://huggingface.co/api/integrations/dgx/v1",
    api_key="MY_FINEGRAINED_TOKEN"
)

And then it requires a model id to be passed in the model argument of chat_completions:

output = client.chat.completions.create(
    model="meta-llama/Meta-Llama-3-8B-Instruct",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Count to 10"},
    ],
    stream=True,
    max_tokens=1024,
)

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

Additional context PR for reference: https://github.com/huggingface/huggingface_hub/pull/2482

Sep 04 '24 13:09 plaguss

distilabel distilabel copied to clipboard

[FEATURE] Add support for `HF Nvidia NIM API` on `InferenceEndpointsLLM`

distilabel
distilabel copied to clipboard