parson icon indicating copy to clipboard operation
parson copied to clipboard

Robustness for locales support

Open zois-tasoulas opened this issue 10 months ago • 2 comments

The current version of parse_number_value can break JSON object parsing. Function parse_number_value calls strtod to convert strings to doubles.

Since the parson library parses JSON files, it is a common scenario to have numeric fields such as:

"key": 1,

Some numeric locals, e.g., many European languages, use comma as the decimal separator. The latter will cause parsing errors since parse_number_value will consume the comma and remove it from the remaining string contents.

From the strtod reference page, link

Then it takes as many characters as possible to form a valid floating-point representation and converts them to a floating-point value. The valid floating-point value can be one of the following:

decimal floating-point expression. It consists of the following parts: ... nonempty sequence of decimal digits optionally containing decimal-point character (as determined by the current C locale) (defines significand)

Callers of parse_number_value might rely on finding the comma in string to verify that the remaining content of string is a valid JSON object.

Specifically, the remaining content of string is returned to parse_value and eventually to parse_object_value. The latter explicitly looks for a comma and will fail on valid JSON objects on locals that use comma as the decimal separator..

// inside function `parse_object_value`

        SKIP_WHITESPACES(string);
        if (**string != ',') {
            break;
        }

zois-tasoulas avatar Feb 24 '25 23:02 zois-tasoulas

It's been already mentioned a few times and I didn't find a good way of fixing this (https://github.com/kgabis/parson/issues/98#issuecomment-589346669). The only solution that comes to mind is having an option to use strtod_l hidden behind a PARSON_USE_STRTOD_L define.

kgabis avatar Feb 25 '25 08:02 kgabis

I see, thanks for the pointer. I am not very familiar with parson so not sure about the whole design.

Would it make sense for function parse_number_value to sanity check the contents of string before and after the call to strtod?

I.e., check if a comma exists before the first \n character. If it does proceed with calling strtod. Then if end doesn't have that comma as first character, prepend it to end.

zois-tasoulas avatar Feb 25 '25 15:02 zois-tasoulas