kotlinx.serialization icon indicating copy to clipboard operation
kotlinx.serialization copied to clipboard

How to serialize BigInteger and BigDouble?

Open MMairinger opened this issue 5 years ago • 12 comments

Currently I'm in the need of serializing numbers with many decimal digits (30+ decimal digits to be more precise). To deserialzie a JSON file with this many decimal points I used Java's BigIntegers and BigDecimal types which works fine. The problem that arises is when I want to serialize that value. It will be serialized as an Integer or Double respectively which cuts and rounds the actual value.

One example value is this, received in a JSON file. 0.083532162010669708251953125000 After deserializing I will have exactly a value of that above as a JsonLiteral. But when I serialize this value I will get a result of: 0.08353216201066971 Which is not my desired result. A snippet from my unit tests showing the differences between expected and actual: image

My questions are now how should I serialized BigIntegers and BigDoubles and will there be support for this kind of data types in the future?

MMairinger avatar Sep 08 '20 09:09 MMairinger

Can you, please, provide the example code you use to serialize and deserialize your data between JSON and BigIntegers?

elizarov avatar Sep 08 '20 12:09 elizarov

var deserializedJson= json.decodeFromString(JsonClassSerializer, jsonToParse) var convertedKotlin= convertJsonToKotlin(deserializedJson) var convertedJson= convertKotlinToJson(convertedKotlin) var serializedJson= generateJsonOutputFromKotlin(convertedJson, json)

The above sequence is what I went through. I took a JSON file as String input, configured my Json object and then decoded that input into an object instance of some JsonClass containing properties like val x: JsonArray and so on. Then I convert that object into a different a different type that contains useable data types like for val x: List<Map<String, Any?>> and stuff like that. Then that instance is converted back into a JsonClass instance, because we can't serialize Any types. Finally the json output is generated from that JsonClass and here the problem arises. Because the standard serializer for JsonArray writes Numbers as either Long or Double.

The initial deserialization is correct, the Json -> Kotlin conversion is correct, transforming it back from Kotlin -> Json is still correct but the standard serializer when you annotate your class with @Serializable writes BigDecimal and BigIntegers as Double and Long respectively.

To clarify I did not use a custom serializer, for anything here. I simply let kotlinx handle the serialization of the intermediate type and then converted that to the actual types.

This is how I converted a JsonLiteral to BigInteger and so on: internal fun convertJsonPrimitive(jsonPrimitive: JsonPrimitive): Any? { return if (jsonPrimitive.isString) jsonPrimitive.contentOrNull else jsonPrimitive.booleanOrNull ?: if (jsonPrimitive.longOrNull != null) // if the casted value is the same as the raw json data, then the data is within double/long range if (jsonPrimitive.longOrNull.toString() != (jsonPrimitive.content)) BigInteger(jsonPrimitive.content) else jsonPrimitive.long else if (jsonPrimitive.doubleOrNull != null) if (jsonPrimitive.doubleOrNull.toString() != (jsonPrimitive.content)) BigDecimal(jsonPrimitive.content) else jsonPrimitive.double else JsonNull }

Edit: The code block seems to be displayed weirdly.

MMairinger avatar Sep 09 '20 08:09 MMairinger

Do you have any special requirement to convert your input to JsonElement and only after that to a Kotlin class? If not, then you can skip JsonElement step and parse json to a kotlin class directly via decodeToString. You'll need to write a custom serializer for big numbers, however, it will be relatively simple: only decodeString/encodeString calls. See the sample here: https://github.com/Kotlin/kotlinx.serialization/blob/master/docs/serializers.md#primitive-serializer

sandwwraith avatar Sep 14 '20 15:09 sandwwraith

Yes, in fact, directly converting them was my initial plan until I saw that you cannot directly serialize Any types. As you answered in some old issue that we should use JsonObject to serialize types of Map<String, Any> I then did the same thing for every other type that had Any in it, i.e. JsonArray for List<Any> and so on.

I also fideled around with custom serializers but I gave up eventually since I couldn't get it to work. The easiest solution was for me to simply let kotlinx do the deserialization and then convert it with simple methods to standard kotlin types. Same goes for serialization.

Thanks for the answer, so I need a custom serializer for that.

Is it possible for the deserialization/serialization of BigInts/Doubles to become a feature in the future so we no longer need a custom serializer for very long numbers?

MMairinger avatar Sep 21 '20 11:09 MMairinger

As it is, there is no way to encode a JSON number outside of Kotlin's primitive types, which means that it is impossible to encode a decimal value without inducing precision loss.

  • If you use encodeString, StreamingJsonEncoder will quote the value, regardless of the descriptor kind.
  • If you use encodeJsonElement with JsonPrimitive, the value's string representation will be converted using String.toDoubleOrNull(), not only inducing precision loss but formatting the value using engineering notation (e.g. 1.11222333444E11).
  • There is no mechanism to encode an unquoted string value that I can find, using either Encoder or JsonEncoder.

Bear in mind that ECMA-404 does not specify that JSON numbers must represent IEEE-754 values; they are simply strings of digits with optional fraction and exponent parts. From json.org:

number
    integer fraction exponent

integer
    digit
    onenine digits
    '-' digit
    '-' onenine digits

digits
    digit
    digit digits

digit
    '0'
    onenine

onenine
    '1' . '9'

fraction
    ""
    '.' digits

exponent
    ""
    'E' sign digits
    'e' sign digits

sign
    ""
    '+'
    '-'

JsonEncoder should expose a mechanism to write an unquoted JSON number, which would solve this particular issue and allow for the use of non-Number types.

tadfisher avatar Apr 01 '21 19:04 tadfisher

I find out that using isLenient configuration, combined with in-build String constructor of BigDecimal, can solve the precision loss problem here.

With a serializer to deserialize String to BigDecimal:

object BigDecimalSerializer: KSerializer<BigDecimal> {
    override fun deserialize(decoder: Decoder): BigDecimal {
        return decoder.decodeString().toBigDecimal()
    }

    override fun serialize(encoder: Encoder, value: BigDecimal) {
        encoder.encodeString(value.toPlainString())
    }

    override val descriptor: SerialDescriptor
        get() = PrimitiveSerialDescriptor("BigDecimal", PrimitiveKind.STRING)
}

And a Json instance with isLenient setting active:

val json = Json { isLenient = true }

Which gives an advantage to allow String without " mark. (i.e. can treat json number as String)

We can deserialize a data class like this

@Serializable
data class TestDateAndValue(
    val date: String,
    @Serializable(with = BigDecimalSerializer::class)
    val value: BigDecimal,
    val anotherValue: Double,
)

WITHOUT precision loss:

@Test
fun `kotlin Json Parse`() {
    val jsonString = """
        {
            "date": "20220704",
            "value": 1234.56789123456789,
            "anotherValue": 123.456789
        }
    """.trimIndent()
    val parse = json.decodeFromString<TestDateAndValue>(jsonString)
    assertEquals("20220704", parse.date)
    assertEquals(BigDecimal("1234.56789123456789"), parse.value)
    assertEquals(123.456789, parse.anotherValue)
}

And it doesn't really interfere the other number serialization, as shown above, anotherValue deserialized without problems.

It looks perfect to me. Just share with you guys :)

samuelchou avatar Jul 04 '22 09:07 samuelchou

@samuelchou The problem is in the encoding; even with the lenient flag, serializing will quote the value, which the server then needs to support. There's still no way to encode a BigDecimal as an arbitrary JSON number: StreamingJsonEncoder doesn't have an option to emit unquoted strings or otherwise arbitrary content; and TreeJsonEncoder uses JsonPrimitive for encoding, which calls toDouble under the hood.

tadfisher avatar Jul 04 '22 18:07 tadfisher

Oh, sorry for my misunderstanding, and thanks for your explaining. I now see the problem.

samuelchou avatar Jul 05 '22 06:07 samuelchou

There is a bit of a challenge here as Json does not explicitly restrict number size. It suggests that 64-bit numbers should be supported, but even that is implementation specific. On the other hand, in cases where the json is generated externally it would be good if the system would be able to support deserializing arbitrary size numbers.

pdvrieze avatar Jul 05 '22 14:07 pdvrieze

I've written this to to parse BigIntegers (and similarly BigDecimals) from either JSON numbers or strings, without losing precision.

object BigIntegerSerializer : KSerializer<BigInteger> {
    override val descriptor: SerialDescriptor = PrimitiveSerialDescriptor("BigInteger", PrimitiveKind.INT)

    override fun serialize(encoder: Encoder, value: BigInteger) = encoder.encodeString(value.toString())

    override fun deserialize(decoder: Decoder): BigInteger = BigInteger(decoder.decodeString())
}

object LenientBigIntegerSerializer : JsonTransformingSerializer<BigInteger>(BigIntegerSerializer) {
    override fun transformDeserialize(element: JsonElement): JsonElement {
        if (element is JsonPrimitive && !element.isString) {
            return JsonPrimitive(element.content)
        }
        return super.transformDeserialize(element)
    }

    override fun transformSerialize(element: JsonElement): JsonElement {
        if (element is JsonPrimitive && element.isString) {
            return JsonPrimitive(BigInteger(element.content))
        }
        return super.transformSerialize(element)
    }
}

pschichtel avatar Jul 10 '22 17:07 pschichtel

@pschichtel Again, this issue is about serializing. Your code will still lose precision when encoding, because JsonPrimitive converts to Double under the hood.

tadfisher avatar Jul 10 '22 19:07 tadfisher

If the problem is cannot transform String value to Number value (i.e. cannot remove ") by json serializer / JsonPrimitive, how about generating JsonObject and turn it into a String, then use a Regex for replacing " (and turn it back to JsonObject by parsing)?

It might be tricky but I assume that would work.

samuelchou avatar Jul 12 '22 07:07 samuelchou

I've created a PR #2041 that will allow for accurate encoding and decoding of BigDecimals

See this test for an example: https://github.com/Kotlin/kotlinx.serialization/blob/46a5ff60b21b85f0a1d98c66f4d077e86e405ea6/formats/json-tests/jvmTest/src/kotlinx/serialization/BigDecimalTest.kt

aSemy avatar Sep 30 '22 11:09 aSemy