frequency_penalty at 0 causes no response content with Phi-3-mini-4k-cpu-int4-rtn-block-32-acc-level-4-onnx
Frequency response at 0 causes an issue with no content in the response. > 0 by < 1 cause other weird responses. 1 seems to be the only reliable value and its unclear if its the model or something else.
POST http://127.0.0.1:5272/v1/chat/completions
content-type: application/json
{
"messages": [
{
"role": "user",
"content": "Whats the golden ratio"
}
],
"frequency_penalty": 0,
"model": "Phi-3-mini-4k-cpu-int4-rtn-block-32-acc-level-4-onnx"
}
You will get a response like:
{
"model": null,
"choices": [
{
"delta": {
"role": "assistant",
"content": "",
"name": null,
"tool_call_id": null,
"function_call": null,
"tool_calls": null
},
"message": {
"role": "assistant",
"content": "",
"name": null,
"tool_call_id": null,
"function_call": null,
"tool_calls": null
},
"index": 0,
"finish_reason": "stop",
"finish_details": null,
"logprobs": null
}
],
"usage": null,
"created": 1724095112,
"id": "chat.id.2641",
"system_fingerprint": null,
"object": "chat.completion",
"Successful": true,
"error": null,
"HttpStatusCode": 0,
"HeaderValues": null
}
@a1exwang - is this caused by invalid parameter? May consider adding value check for all input parameters.
-
AITK uses ONNX runtime GenAI for inference and
frequency_penaltyis converted torepetition_penaltybehind the scene. -
According to ONNX documentation,
repetition_penaltycannot be 0. -
As the tooltip mentions, this parameter controls likelihood of repetition. So if you set a lower value, it will likely repeat itself. That's why you will see weird values when set to 0~1.
-
The value
1is not the only reliable value. You can also set it to greater than 1, which will decrease the likelihood of repetition more.
I think we can add range validation for input parameters as @swatDong said