djl icon indicating copy to clipboard operation
djl copied to clipboard

Support for `extra_body` argument for OpenAI `chat/completion` API

Open massi-ang opened this issue 9 months ago • 0 comments

Description

The OpenAI chat/completion API supports the extra_body argument to pass model specific parameters to the model. Model hosted via vLLM support this feature, but when using DJL/vLLM, extra_body seems to be ignored.

Will this change the current api? How? No. The API remains the same. This is how extra_body is added to the payload:

sm.invoke_endpoint(
    EndpointName = "djl-vllm-endpoint",
    Body=json.dumps({"messages":
        [
            {"role":"system", "content":"You are Qwen, created by Alibaba Cloud. You are a helpful assistant"}, 
            {"role": "user", "content": prompt}
        ], 
        "max_tokens":2048, 
        "temperature": 0.6,
        "stop":["<|im_end|>"],
        "extra_body": {"skip_special_tokens": False}
    }),
    ContentType='application/json'
)

Who will benefit from this enhancement? All those who host model inference via DJL an dparticularly for SageMaker

References

  • 1 vLLM SamplingPrameters

massi-ang avatar Feb 18 '25 13:02 massi-ang