haystack
haystack copied to clipboard
JsonSchemaValidator: unintended primitive value conversion in `_recursive_json_to_object` method
Describe the bug
The current implementation of the _recursive_json_to_object
method within the JsonSchemaValidator
class inadvertently converts primitive string values to their respective data types (e.g., converting numeric strings to integers) when processing JSON content. This behavior occurs during the json.loads(value)
step, where the method does not distinguish between primitive values and JSON objects or arrays. As a result, string values that represent numeric or boolean data are automatically converted to their corresponding data types, leading to potential mismatches with the expected data types defined in the JSON schema.
Expected behavior
The correct behavior should ensure that the _recursive_json_to_object
method maintains the original data types of primitive values as specified in the input JSON content. This requires the method to identify and preserve primitive string values (e.g., numeric strings, boolean strings) without converting them to other data types during the parsing process. Only non-primitive values (i.e., values that represent actual JSON objects or arrays) should be parsed and converted into their respective complex types. This approach will ensure that the JSON content's integrity is maintained, and schema validations are performed accurately according to the specified data types.
Additional context
The issue highlights the need for a more nuanced parsing mechanism within the _recursive_json_to_object
method that can accurately differentiate between primitive and non-primitive values. This distinction is critical for applications that rely on strict data type validations against a JSON schema, where preserving the original data type of each value is essential for successful validation.
A possible solution involves enhancing the parsing logic to check the result of json.loads(value)
and only proceed with converting the value if it is indeed a non-primitive data type (i.e., a dictionary or list). If the parsed value is a primitive data type (e.g., integer, float, boolean), the original string value should be retained. Implementing this solution will address the unintended data type conversion issue, thereby improving the functionality and reliability of the JsonSchemaValidator
in handling JSON content with strict type requirements.