haystack icon indicating copy to clipboard operation
haystack copied to clipboard

JsonSchemaValidator: unintended primitive value conversion in `_recursive_json_to_object` method

Open vblagoje opened this issue 10 months ago • 0 comments

Describe the bug

The current implementation of the _recursive_json_to_object method within the JsonSchemaValidator class inadvertently converts primitive string values to their respective data types (e.g., converting numeric strings to integers) when processing JSON content. This behavior occurs during the json.loads(value) step, where the method does not distinguish between primitive values and JSON objects or arrays. As a result, string values that represent numeric or boolean data are automatically converted to their corresponding data types, leading to potential mismatches with the expected data types defined in the JSON schema.

Expected behavior

The correct behavior should ensure that the _recursive_json_to_object method maintains the original data types of primitive values as specified in the input JSON content. This requires the method to identify and preserve primitive string values (e.g., numeric strings, boolean strings) without converting them to other data types during the parsing process. Only non-primitive values (i.e., values that represent actual JSON objects or arrays) should be parsed and converted into their respective complex types. This approach will ensure that the JSON content's integrity is maintained, and schema validations are performed accurately according to the specified data types.

Additional context

The issue highlights the need for a more nuanced parsing mechanism within the _recursive_json_to_object method that can accurately differentiate between primitive and non-primitive values. This distinction is critical for applications that rely on strict data type validations against a JSON schema, where preserving the original data type of each value is essential for successful validation.

A possible solution involves enhancing the parsing logic to check the result of json.loads(value) and only proceed with converting the value if it is indeed a non-primitive data type (i.e., a dictionary or list). If the parsed value is a primitive data type (e.g., integer, float, boolean), the original string value should be retained. Implementing this solution will address the unintended data type conversion issue, thereby improving the functionality and reliability of the JsonSchemaValidator in handling JSON content with strict type requirements.

vblagoje avatar Apr 03 '24 08:04 vblagoje