ml-commons icon indicating copy to clipboard operation
ml-commons copied to clipboard

[BUG] Your request contained invalid JSON: 'utf-8' codec can't decode byte 0xeb in position xx: invalid continuation byte

Open ulan-yisaev opened this issue 1 year ago • 3 comments

What is the bug? It is the same bug as described in the https://github.com/opensearch-project/ml-commons/issues/1666, but with the connector to Azure OpenAI embedding model. I was able to add connector for the Azure OpenAI Ada embedding using this issue: https://github.com/opensearch-project/ml-commons/issues/1367

When attempting to return an embedding for a string containing some German characters like ë, ä, I get the error

[2024-01-17T13:39:57,049][ERROR][o.o.m.e.a.r.RemoteModel ] [05f4d0a3acfe] Failed to call remote model org.opensearch.OpenSearchStatusException: Error from remote service: { "error": { "message": "Your request contained invalid JSON: 'utf-8' codec can't decode byte 0xeb in position 43: invalid continuation byte", "type": "invalid_request_error", "param": null, "code": null } }

When I remove the German character it works. I am retrieving the embedding from the _predict endpoint:

POST http://localhost:9200/_plugins/_ml/models/{model_id}/_predict
{
    "parameters": {
        "input": ["This is a string containing Moët Hennessy"]
    }
}

Removing the special character and replacing it with e works. If I request the embedding directly from Azure Open AI (with special character) it works fine.

How can one reproduce the bug? Steps to reproduce the behavior:

  1. Setup Azure Open AI connector described here
  2. Retrieve embedding for string described above.

What is the expected behavior? Embedding should be returned for strings containing special characters.

What is your host/environment?

  • OS: Ubuntu
  • OpenSearch 2.11 (latest version)

Do you have any screenshots? image

Do you have any additional context? Add any other context about the problem.

ulan-yisaev avatar Jan 17 '24 13:01 ulan-yisaev

hmm, do I understand correctly that the fix for the previous bug is not yet included in the 2.11.1.0 release?

ulan-yisaev avatar Jan 17 '24 13:01 ulan-yisaev

It seems I have to wait until "OpenSearch 2.12.0 release is currently scheduled to be released on Jan 23 2024"

ulan-yisaev avatar Jan 18 '24 07:01 ulan-yisaev

It seems I have to wait until "OpenSearch 2.12.0 release is currently scheduled to be released on Jan 23 2024"

Yes, this bug fix https://github.com/opensearch-project/ml-commons/pull/1691 will be released in 2.12.0

ylwu-amzn avatar Feb 02 '24 10:02 ylwu-amzn

@ylwu-amzn Thanks for fixing the issue. Unfortunately, the problem still persists if the OpenSearch cluster is deployed on Cloud (managed by AWS). OpenSearch version: 2.13. Screenshot 2024-06-07 at 10 40 23 Screenshot 2024-06-07 at 10 43 01

Tiberiu07 avatar Jun 07 '24 08:06 Tiberiu07