azure-cosmos-db-emulator-docker icon indicating copy to clipboard operation
azure-cosmos-db-emulator-docker copied to clipboard

Unable to create item caused by unsupported Unicode escape sequence #cosmosEmulatorVnextPreview

Open HulinCedric opened this issue 1 year ago • 16 comments

Describe the bug After setting up the emulator using the instructions outlined here, creating a database and a container, when I try to create an item containing a string with diacritic, the client throw a CosmosDbException containing the reason: {"code":"InternalServerError","message":"unsupported Unicode escape sequence"}

To Reproduce Steps to reproduce the behavior:

  1. Docker compose that replicates the setup instructions
  2. Use the explorer to create a database and a container.
  3. Create a .NET 9 application and add the Cosmos connection string, database and container to the development settings.
  4. Execute a method that tries to create an item containing a string with diacritic in the container.

Expected behavior The item is created and the client doesn't throw a CosmosDbException.

Screenshots Image

Desktop (please complete the following information):

  • OS: macOS Sonoma 14.6.1
  • dotnet: 9.0.100

Docker Images Used:

  • Linux

HulinCedric avatar Nov 22 '24 17:11 HulinCedric

Rohan spent some time today trying to repro this locally using the steps mentioned in the bug, but did not find much luck. This specific test scenario seems to be working - both via .NET SDK and data explorer. We will keep investigating but wondering if you can give us more information. It would also be helpful if you could attach the logs from your container by running docker cp <container-id>:/logs . && zip -r "logs.zip" "logs" - please attach those logs to aid in debugging.

xgerman avatar Nov 27 '24 18:11 xgerman

Thanks for your help and your investiguation.

I have identified what caused the issue.

Here is my CosmosClientOptions during my discovery of the emulator.

private static CosmosClientOptions CosmosClientOptions()
      => new CosmosClientOptions
      {
          LimitToEndpoint = true, // only for emulator (High availability)
          ConnectionMode = ConnectionMode.Gateway, // only for emulator (http vs tcp)
          //  AllowBulkExecution = true, => not work with emulator
          UseSystemTextJsonSerializerWithOptions = new JsonSerializerOptions
          {
              PropertyNamingPolicy = JsonNamingPolicy.CamelCase,
              PropertyNameCaseInsensitive = true
          }
      };

When I use the UseSystemTextJsonSerializerWithOptions, I have the exception.

HulinCedric avatar Nov 27 '24 20:11 HulinCedric

@xgerman any estimates with regards to when this will be fixed? I just tried the latest version of vnext-preview and the bug still happens with System.Text.Json.

gpetrou avatar Jan 10 '25 08:01 gpetrou

I have the same problem. Trying to run the vnext-preview in macOS, and get the same issue when using UseSystemTextJsonSerializerWithOptions or setting a JsonSerializerOptions to the Serializer property.

But when I test further I notice that very simple objects works without any problem. I can try to figure out if there's any specific field(s) in my models creating this issue.

einord avatar Jan 17 '25 15:01 einord

It happens with Python too.

Package: azure-cosmos==4.9.0

my_container.create_item({"id": "1", "name": "héllo"})

AndreuCodina avatar Feb 14 '25 13:02 AndreuCodina

Same issue here, also using the Python SDK, trying to write data with letters 'å', 'ä', 'ö'. Importing through the explorer UI works fine with JSON-files containing these characters.

MemmoB avatar Mar 12 '25 06:03 MemmoB

When using the CosmosDB client from python the error in the Postgresql log is:

2025-03-12 09:44:43.745 UTC [1347] ERROR:  unsupported Unicode escape sequence
2025-03-12 09:44:43.745 UTC [1347] DETAIL:  Unicode escape value could not be translated to the server's encoding SQL_ASCII.
2025-03-12 09:44:43.745 UTC [1347] CONTEXT:  JSON data, line 1: ...m ipsum"},{"role":"assistant","content":"Sj\u00e4...

Using the explorer or curl against http://localhost:1234/proxy... records are stored without any errors.

MemmoB avatar Mar 12 '25 10:03 MemmoB

The issue might be in azure/cosmos/_synchronized_request.py: 61

     if isinstance(data, (dict, list, tuple)):
        json_dumped = json.dumps(data, separators=(",", ":"))

Adding ensure_ascii=False will pass a string with for example 'ö' as is which probably will make the request succeed. Using Python requests against the proxy with and without ensure_ascii for a given JSON payload shows the same results, i.e. that it works when the payload is dumped to JSON with ensure_ascii=False and the database throws the same error when it's left out.

After starting to write this comment I changed my Python cosmos SDK locally and tested it, now it works.

MemmoB avatar Mar 12 '25 11:03 MemmoB

Facing this issue still with python, when attempting to send japanese texts as value in the json. Any updates?

snigdho611 avatar Apr 13 '25 00:04 snigdho611

Does not work with docker Image

var configuration = ServiceProvider.GetRequiredService<IConfiguration>();
var endpoint = configuration.GetCosmosEndpoint();
var secretKey = configuration.GetCosmosSecretKey();

var options = new CosmosClientOptions
{
	HttpClientFactory = CreateHttpClientFactory(),
	ConnectionMode = ConnectionMode.Gateway,
	UseSystemTextJsonSerializerWithOptions = JsonSerializerOptions.Default,
};

return new CosmosClient(endpoint, secretKey, options);

// HTTP client factory
private static Func<HttpClient> CreateHttpClientFactory()
{
	var httpClientHandler = new HttpClientHandler
	{
		ServerCertificateCustomValidationCallback = HttpClientHandler.DangerousAcceptAnyServerCertificateValidator
	};

	var httpClient = new HttpClient(httpClientHandler);
	return () => httpClient;
}

Create a new item:

var database = CosmosClient.GetDatabase("my-database");
var container = database.GetContainer("my-container");

var partitionKey = new PartitionKey("1");
var model = new { id = "1", text = "Hello, 世界" };

var response = await container.CreateItemAsync(model, partitionKey, cancellationToken: ct)
	.ConfigureAwait(false);
		
return response.StatusCode == HttpStatusCode.OK;

Response:

{"code":"InternalServerError","message":"unsupported Unicode escape sequence"}
RequestUri: https://127.0.0.1:8081/dbs/my-database/colls/data/docs;
RequestMethod: POST;
Header: Authorization Length: 84;
Header: x-ms-date Length: 29;
Header: x-ms-documentdb-partitionkey Length: 5;
Header: x-ms-cosmos-sdk-supportedcapabilities Length: 1;
Header: x-ms-activity-id Length: 36;
Header: Cache-Control Length: 8;
Header: User-Agent Length: 90;
Header: x-ms-version Length: 10;
Header: Accept Length: 16;

ActivityId: 1546b289-812b-4e1b-980b-ff31476c4cca, Request URI: /dbs/my-database/colls/data/docs, RequestStats: Microsoft.Azure.Cosmos.Tracing.TraceData.ClientSideRequestStatisticsTraceDatum, SDK: Windows/10.0.26100 cosmos-netstandard-sdk/3.38.0

MihailsKuzmins avatar May 16 '25 00:05 MihailsKuzmins

Does not work with docker Image

var configuration = ServiceProvider.GetRequiredService<IConfiguration>(); var endpoint = configuration.GetCosmosEndpoint(); var secretKey = configuration.GetCosmosSecretKey();

var options = new CosmosClientOptions { HttpClientFactory = CreateHttpClientFactory(), ConnectionMode = ConnectionMode.Gateway, UseSystemTextJsonSerializerWithOptions = JsonSerializerOptions.Default, };

return new CosmosClient(endpoint, secretKey, options);

// HTTP client factory private static Func<HttpClient> CreateHttpClientFactory() { var httpClientHandler = new HttpClientHandler { ServerCertificateCustomValidationCallback = HttpClientHandler.DangerousAcceptAnyServerCertificateValidator };

var httpClient = new HttpClient(httpClientHandler); return () => httpClient; } Create a new item:

var database = CosmosClient.GetDatabase("my-database"); var container = database.GetContainer("my-container");

var partitionKey = new PartitionKey("1"); var model = new { id = "1", text = "Hello, 世界" };

var response = await container.CreateItemAsync(model, partitionKey, cancellationToken: ct) .ConfigureAwait(false);

return response.StatusCode == HttpStatusCode.OK; Response:

{"code":"InternalServerError","message":"unsupported Unicode escape sequence"}
RequestUri: https://127.0.0.1:8081/dbs/my-database/colls/data/docs;
RequestMethod: POST;
Header: Authorization Length: 84;
Header: x-ms-date Length: 29;
Header: x-ms-documentdb-partitionkey Length: 5;
Header: x-ms-cosmos-sdk-supportedcapabilities Length: 1;
Header: x-ms-activity-id Length: 36;
Header: Cache-Control Length: 8;
Header: User-Agent Length: 90;
Header: x-ms-version Length: 10;
Header: Accept Length: 16;

ActivityId: 1546b289-812b-4e1b-980b-ff31476c4cca, Request URI: /dbs/my-database/colls/data/docs, RequestStats: Microsoft.Azure.Cosmos.Tracing.TraceData.ClientSideRequestStatisticsTraceDatum, SDK: Windows/10.0.26100 cosmos-netstandard-sdk/3.38.0

Same here in Python:

"""Simple Proof of Concept for Unicode Bug"""

import os

from azure.cosmos import CosmosClient, PartitionKey
from dotenv import load_dotenv

load_dotenv()

url = os.environ.get("COSMOS_DB_URI_NOT_HTTPS")
key = os.environ.get("COSMOS_DB_KEY")

if not url or not key:
    raise ValueError("COSMOS_DB_URI_NOT_HTTPS and COSMOS_DB_KEY must be set")

client = CosmosClient(url, credential=key)

database_name = "test"
data_client = client.create_database_if_not_exists(database_name)

collection_client = data_client.create_container_if_not_exists("test", PartitionKey(path=f"/id"))

# ’ is the offending character
collection_client.upsert_item(
    {"id": "1", "name": "Absolutely, I’d be happy to help you boost your productivity!"}
)

henrymcl avatar Jun 03 '25 09:06 henrymcl

After some basic investigation, the request is sent as the json string:

{"query":"SELECT * FROM c WHERE c.id IN ( @ids )","parameters":[{"name":"@ids","value":"\u652f"}]}

henrymcl avatar Jun 10 '25 13:06 henrymcl

Also seeing this problem which prevents us from using the emulator entirely.

baumatron avatar Jul 11 '25 19:07 baumatron

Does anyone know any work around for the issue?

samoruk avatar Sep 15 '25 22:09 samoruk

Does anyone know any work around for the issue?

@samoruk My silly workaround is to just strip out non-ASCII characters before storing them:

def remove_unsafe_unicode_chars(obj):
    """Recursively remove Unicode characters above ASCII range (U+007F) from string values in a dictionary or list.
    This prevents CosmosDB Unicode escape sequence errors with emojis and other high Unicode characters."""
    if isinstance(obj, dict):
        return {key: remove_unsafe_unicode_chars(value) for key, value in obj.items()}
    elif isinstance(obj, list):
        return [remove_unsafe_unicode_chars(item) for item in obj]
    elif isinstance(obj, str):
        # Keep only characters in the safe ASCII range (U+0000 to U+007F)
        return ''.join(char for char in obj if ord(char) <= 0x007F)
    else:
        return obj

chrmcg avatar Sep 15 '25 22:09 chrmcg

@samoruk My silly workaround is to just strip out non-ASCII characters before storing them:

@chrmcg Thanks, yeah I've had similar ideas but in the end it was just easier to use windows based emulator or a real Azure Cosmos instance.

samoruk avatar Sep 16 '25 18:09 samoruk