yyjson
yyjson copied to clipboard
Add YYJSON_TYPE_UNQUOTED
I'm planning on implementing this and would like your feedback.
Python supports arbitrarily large numbers (both integers and floats), and it would be nice to be able to support that with JSON input/output as well (as an optional flag, so by default it should raise an error).
I'd add support for this into yyjson as part of the work on py_yyjson by implementing a YYJSON_TYPE_UNQUOTED type. When reading, it parses until the end of a number and stores it in the string pool. Using this replaces all number parsing.
When writing, it's dumped as-is. Because it's not number specific, I've made it its own type instead of a numeric SUBTYPE. It's actually fairly useful for special cases, such as when I have a known-good JSON snippet I want to just sub in as an object value.
This looks good.
I have seen this design in cJSON before, maybe we can name the type as YYJSON_TYPE_RAW just like cJSON, and we can add a YYJSON_READ_NUMBER_AS_RAW flag for JSON reader.
I just wanted to create a similar request. IMO, this should have been there from the start. A simple case is reading regular floats: often basic numbers, like 123.4 for example, do not have exact representation. If one reads a json and then needs to serialize or reformat it again then the output will be different. Most libs (like rapidjson, simdjson) support reading numbers as strings without parsing.
@ibireme any ETA on the enhancement?
I just wanted to create a similar request. IMO, this should have been there from the start. A simple case is reading regular floats: often basic numbers, like
123.4for example, do not have exact representation. If one reads a json and then needs to serialize or reformat it again then the output will be different. Most libs (like rapidjson, simdjson) support reading numbers as strings without parsing.@ibireme any ETA on the enhancement?
Thank you for your comments, I hope I will implement this feature within a month.
Added: https://github.com/ibireme/yyjson/commit/804a8b2771e6bfb273a797f6c2413568a92e7874
Now you can use the YYJSON_READ_NUMBER_AS_RAW flag to parse all numbers as YYJSON_TYPE_RAW values:
bool yyjson_is_raw(yyjson_val *val);
const char *yyjson_get_raw(yyjson_val *val);
size_t yyjson_get_len(yyjson_val *val)
And when JSON is serialized, the value of type YYJSON_TYPE_RAW is written back as is.
Added: 804a8b2 Now you can use the
YYJSON_READ_NUMBER_AS_RAWflag to parse all numbers asYYJSON_TYPE_RAW...
Unfortunately, this forces all numbers to be parsed as strings. It might be more useful to parse them as usual, and provide a reader function that allows reading any number as a raw string.
Something like Direct Access to the Raw String in simdjson would be ideal.
This is to support numbers that cannot be parsed as any native type. Many JSON writers, such as Python's native one, support numbers of any size.
with floats even if they can be parsed to native, they won't be identical if serialized. That's why it would be useful to be able to access raw string data for numbers. Does yyjson already have such ability (besides making all numbers into raw type)? With the approach that simdjson has, code doesn't need to be modified much, with yyjson all number handling has to be changed to be able to read raw floats.
with floats even if they can be parsed to native, they won't be identical if serialized. That's why it would be useful to be able to access raw string data for numbers. Does yyjson already have such ability (besides making all numbers into raw type)? With the approach that simdjson has, code doesn't need to be modified much, with yyjson all number handling has to be changed to be able to read raw floats.
simdjson is more focused on reading JSON, and each of its numbers can be associated with the original JSON string.
yyjson supports creating json documents from scratch, and numbers may not have an original string, so this ability was not considered at the beginning.
Technically I think it should be possible to store string and number in the same JSON value, but I don't want to cause performance degradation or confusion in the API. I'll keep trying to see if there's a better way to do this.
Technically I think it should be possible to store string and number in the same JSON value, but I don't want to cause performance degradation or confusion in the API. I'll keep trying to see if there's a better way to do this.
How about if YYJSON_READ_NUMBER_AS_RAW is enabled, then reference to actual json string for the number is preserved and can be read as a raw number, without affecting regular number code which would still work the same way? May as well perform that raw number parsing on demand if requested. I'm not familiar with the internal impl (I'm only amazed that it's one of the fastest libs with clean simple API), so not sure if my suggestion is possible.
Another alternative: yyjson could expose its code to handle text2num conversion
I believe that the original use case I described is sufficiently solved, and I've implemented this in practice for py_yyjson which can be used as an example.