kotlinx.serialization icon indicating copy to clipboard operation
kotlinx.serialization copied to clipboard

Support configuring `ensureAscii` to escape non-ASCII characters for Json

Open zhangz1han opened this issue 2 years ago • 1 comments

What is your use-case and why do you need this feature?

In some scenarios, I can only use ASCII characters to store or transmit data. Additional Base64 encoding will reduce performance and reduce readability. Using \uXXXX to escape non-ASCII characters in Json can solve this problem.

This feature is the default output option in the Python standard json library. so adding this feature will also make it easier when migrating Python code to Kotlin code.

# Python
import json

dict = {
    "测": "A测试B",
    "𝄞": "C𝄞D" # Supports Surrogates
}
output = json.dumps(dict) # Equivalent to `json.dumps(dict, ensure_ascii=True)`
print(output)
# {"\u6d4b": "A\u6d4b\u8bd5B", "\ud834\udd1e": "C\ud834\udd1eD"}

Describe the solution you'd like

Add optional configurations ensureAscii and ensureAsciiFormat when instantiating Json.

// Kotlin

val dict = mapOf(
    "测" to "A测试B",
    "𝄞" to "C𝄞D" // Supports Surrogates
)

var json = Json { ensureAscii = true }
println(json.encodeToString(dict))
// {"\u6d4b": "A\u6d4b\u8bd5B", "\ud834\udd1e": "C\ud834\udd1eD"}

json = Json {
    ensureAscii = true
    ensureAsciiFormat = HexFormat.UpperCase
}
println(json.encodeToString(dict))
// {"\u6D4B": "A\u6D4B\u8BD5B", "\uD834\uDD1E": "C\uD834\uDD1ED"}

zhangz1han avatar Nov 29 '23 18:11 zhangz1han

Since these characters can only appear in strings, it should be a relatively easy transformation to do this as a post-processing step regardless of whether this becomes a first-party feature or not.

JakeWharton avatar Nov 29 '23 21:11 JakeWharton