lua-cjson icon indicating copy to clipboard operation
lua-cjson copied to clipboard

Add new decode_big_numbers_as_strings option that defaults to false. …

Open brimworks opened this issue 8 years ago • 9 comments

…If set to true, then any numbers that may result in a precision loss will be preserved as a string.

Test code:


local Cjson = require('cjson')

local decoder = Cjson.new()
decoder.decode_big_numbers_as_strings(true)

print("should be string", type(decoder.decode("9007199254740992")))
print("should be number", type(Cjson.decode("9007199254740992")))

...which works as expected.

brimworks avatar Aug 16 '16 20:08 brimworks

I'd prefer to keep types consistent (numbers are numbers, strings are strings) since that is much less surprising.

It may be better to provide an option to replace the number type entirely with an arbitrary precision library for this use case. Eg: http://webserver2.tecgraf.puc-rio.br/~lhf/ftp/lua/#lmapm

mpx avatar Aug 24 '16 12:08 mpx

I'm not keen on adding a dependency on an arbitrary precision library.

What do you think of adding a new "number" type that is backed by a string? Specifically, the "number" type would be a table with a single element which is a string version of the number. This table would also be associated with a metatable that contains a "__name" field equal to "number". This would be inline with how null and empty arrays are already handled (at least on my fork).

This way the "number" type can also be used to write JSON numbers that do not fit in a double.

Thanks, -Brian

brimworks avatar Aug 24 '16 12:08 brimworks

I'm not sure of the utility for a "string" number. Programs would only be able to print it, and maybe re-encode it back to JSON. In these cases, perhaps the field would be better encoded as a string?

If there is a use case to support larger numbers, I think optional arbitrary precision support would have several advantages over a "string" number:

  • All numbers could be used for computation.
  • You could rely on numbers having a known type after decode.
  • Most users would not be impacted (performance loss / extra dependency) since they would use 64-bit floats (as commonly implemented).

Lua CJSON already supports 2 implementations for float encode/decode (Stdlib and dtoa), so another should fit in ok.

I'd prefer to avoid fields decoding to different types based on their content. I think this would be surprising, place a burden on the use of the library, and would be a likely source of bugs.

mpx avatar Aug 24 '16 13:08 mpx

All arbitrary precision libraries I have used will convert between string and arbitrary precision numbers, so it is very useful to have it encoded as a string.

Let me explain my use-case for this so you can understand why I need this :). I've got code that uses CJSON to parse the output of obtaining the GCE metadata of a host:

https://cloud.google.com/compute/docs/storing-retrieving-metadata

...unfortunately, the "id" field returned is encoded as a long number. When CJSON decodes this long into a double, precision is lost and I can not obtain the original "id" field. Since this is an "id" field, it is now useless.

I have no control of the JSON output encoding of GCE metadata.

Note that I do not do any computations with this number, I simply need to read and write that number, therefore I do not need an arbitrary precision library.

Regarding the comment "I'd prefer to avoid fields decoding to different types based on their content. I think this would be surprising, ...". Note that the default for this option is to have it disabled. So, consumers of this library would need to explicitly enable it to support the use case I've outlined above.

One other idea here:

What if there was a decode_big_numbers_with() option which took as input a function. This function would be called with the big number string as input and the output of that function would be the "thing" placed in the JSON tree. This would give consumers of CJSON the ability to customize how big numbers are decoded.

...although I don't need it for me use case, there should probably also be a way to encode big numbers for writing JSON. Perhaps it could use the same metatable inspection methodology in the ticket regarding encoding of the empty array (vs empty object).

Thanks, -Brian

brimworks avatar Aug 24 '16 13:08 brimworks

Having numeric strings being sent as numerics in JSON is a very common problem (unfortunately). Although this is a problem in the application that is sending the JSON, one might not have the power to change it. I had this problem before and had to tweak cjson to fix my issue. So, having a way to tell cjson how to handle that would be great. Regarding the implementation, I am not sure I like the approach. If you have two fields, one with long numeric that should be string and another with long numeric that should be numeric it is not going to work. Can we use the metatable approach or any other to say how I want a specific field to be interpreted? If not, I still prefer the PR approach than not having it.

pmusa avatar Aug 24 '16 17:08 pmusa

@pmusa what do you think of my suggestion of making it so decode_big_numbers_with() takes a single parameter that is a function. If a "big number" is found, then the user defined function is called with the big number as a string, the results of that user defined function is then placed in the JSON output tree.

So, to use it it would look something like this:

local Cjson = require('cjson')

local decoder = Cjson.new()
decoder.decode_big_numbers_with(function(str_num)
    return setmetatable({str_num}, {__name="number"})
end)

print("should be table", type(decoder.decode("9007199254740992")))
print("should be number", type(Cjson.decode("9007199254740992")))

So far I haven't had a need to encode big numbers in JSON output, but we should probably have a way to do that too. I'd propose that if a value has a metatable with the __name field equal to "number", then element [1] of the table should contain a string representation of the number to be placed in the output JSON. So if you use the decode_big_numbers_with function I've defined above, then the results of the decode can be passed to encode and you don't have to worry about data loss or type conversions.

Thoughts?

Thanks, -Brian

brimworks avatar Aug 25 '16 04:08 brimworks

I've added an "issue" to track the need for big number support:

https://github.com/mpx/lua-cjson/issues/37

brimworks avatar Aug 25 '16 04:08 brimworks

.. discussion continued on #37 ..

mpx avatar Aug 25 '16 09:08 mpx

I would really like to see this merged, because big numbers are pain sometimes.

smartynov avatar Dec 22 '17 02:12 smartynov