djl
djl copied to clipboard
Support for `extra_body` argument for OpenAI `chat/completion` API
Description
The OpenAI chat/completion API supports the extra_body argument to pass model specific parameters to the model. Model hosted via vLLM support this feature, but when using DJL/vLLM, extra_body seems to be ignored.
Will this change the current api? How?
No. The API remains the same. This is how extra_body is added to the payload:
sm.invoke_endpoint(
EndpointName = "djl-vllm-endpoint",
Body=json.dumps({"messages":
[
{"role":"system", "content":"You are Qwen, created by Alibaba Cloud. You are a helpful assistant"},
{"role": "user", "content": prompt}
],
"max_tokens":2048,
"temperature": 0.6,
"stop":["<|im_end|>"],
"extra_body": {"skip_special_tokens": False}
}),
ContentType='application/json'
)
Who will benefit from this enhancement? All those who host model inference via DJL an dparticularly for SageMaker
References
- 1 vLLM SamplingPrameters