Clarify JSON marshalling of `Decimal` values
About
How to marshal values of Python's Decimal type.
Status quo
Currently, the library is conveying Decimal values as strings.
https://github.com/crate/crate-python/blob/7c7a88521d48a9d5225dbd3842f3c4c07fafd4d8/src/crate/client/http.py#L105-L106
See also
Others apparently need serialization to float.
if isinstance(obj, Decimal):
return float(obj)
-- https://github.com/crate-workbench/meltano-target-cratedb/blob/148a2d5ec7131658c1a74aed7d91f23befc81341/target_cratedb/sqlalchemy/patch.py#L65-L66
References
- https://github.com/crate-workbench/meltano-target-cratedb/issues/52
To me it seems that serializing it to float is the best case, casting to both gives correct precision
up until one point, then str just adds binary floating point noise:
>>>float(Decimal(0.023423432432))
0.023423432432
but only up to some precision:
float(Decimal(0.0234234324300000123456789))
0.023423432430000012
but in str
>>> str(Decimal(0.0234234324300000123456789))
'0.0234234324300000122665021962120590615086257457733154296875'
Thoughts?
Yes, I think that's the whole point here: Converging to float is bound to IEEE 754 floating-point arithmetic precision losses, while converging to str retains precision based on user's needs, which would be the right choice to transport numbers with fixed-point arithmetics, if the transport channel does not offer any other (dedicated) data type for that purpose. My university days are years in the past, so please correct me where you think I'm wrong.
Thoughts?
There is no floating point "noise", nor any sort of rounding errors, if you just stop using the float data type, also in literal form in code. str is certainly the best representation format in ASCII which will not incur losses, that's why we are using it firsthand when marshalling to JSON.
>>> str(Decimal("0.02342343243000001234567890123456789012345678901234567890123456789"))
'0.02342343243000001234567890123456789012345678901234567890123456789'
Recap: We are running this discussion because we discovered the Singer/Meltano database adapters apparently expect float values at some spot I touched the other day when conceiving meltano-target-cratedb, most probably due to needing to satisfy a test case of target-postgres. Contrary to that need of a downstream database connector component, the crate-python driver currently serializes Python's Decimal data type to str.
NB: I will need to revisit the code base to be able to point out the spot why exactly this patch was needed. Maybe I was chasing ghosts here? @edgarrmondragon: Does anything around float/decimal serialization in Meltano database adapters ring any bell with you?
Right, had not seen that I was using a float literal instead of a string, airport jet-lag I guess. Yeah, it seems that using a str is the best solution, is there anything that needs to be done in this repository seeing that we already serialize to string?
If some downstream applications or frameworks absolutely need floats at this point, and can't cope by their own, the crate-python library could possibly offer configurability.
Does anything around float/decimal serialization in Meltano database adapters ring any bell with you?
there was https://github.com/python-jsonschema/jsonschema/issues/701, but that was fixed a while back
Thank you very much. After reading the conversation you've shared I also think this isn't an issue here. I think I will need to review the original spot at meltano-target-cratedb to recap what has been going on there about the marshalling.
- https://github.com/crate-workbench/meltano-target-cratedb/issues/52
I will report here as soon as I have more information.