ml-commons
ml-commons copied to clipboard
[BUG] Your request contained invalid JSON: 'utf-8' codec can't decode byte 0xeb in position xx: invalid continuation byte
What is the bug? It is the same bug as described in the https://github.com/opensearch-project/ml-commons/issues/1666, but with the connector to Azure OpenAI embedding model. I was able to add connector for the Azure OpenAI Ada embedding using this issue: https://github.com/opensearch-project/ml-commons/issues/1367
When attempting to return an embedding for a string containing some German characters like ë, ä, I get the error
[2024-01-17T13:39:57,049][ERROR][o.o.m.e.a.r.RemoteModel ] [05f4d0a3acfe] Failed to call remote model org.opensearch.OpenSearchStatusException: Error from remote service: { "error": { "message": "Your request contained invalid JSON: 'utf-8' codec can't decode byte 0xeb in position 43: invalid continuation byte", "type": "invalid_request_error", "param": null, "code": null } }
When I remove the German character it works. I am retrieving the embedding from the _predict endpoint:
POST http://localhost:9200/_plugins/_ml/models/{model_id}/_predict
{
"parameters": {
"input": ["This is a string containing Moët Hennessy"]
}
}
Removing the special character and replacing it with e works. If I request the embedding directly from Azure Open AI (with special character) it works fine.
How can one reproduce the bug? Steps to reproduce the behavior:
- Setup Azure Open AI connector described here
- Retrieve embedding for string described above.
What is the expected behavior? Embedding should be returned for strings containing special characters.
What is your host/environment?
- OS: Ubuntu
- OpenSearch 2.11 (latest version)
Do you have any screenshots?
Do you have any additional context? Add any other context about the problem.
hmm, do I understand correctly that the fix for the previous bug is not yet included in the 2.11.1.0 release?
It seems I have to wait until "OpenSearch 2.12.0 release is currently scheduled to be released on Jan 23 2024"
It seems I have to wait until "OpenSearch 2.12.0 release is currently scheduled to be released on Jan 23 2024"
Yes, this bug fix https://github.com/opensearch-project/ml-commons/pull/1691 will be released in 2.12.0
@ylwu-amzn Thanks for fixing the issue. Unfortunately, the problem still persists if the OpenSearch cluster is deployed on Cloud (managed by AWS). OpenSearch version: 2.13.