datafusion
datafusion copied to clipboard
"Cannot infer common argument type for comparison operation Union..."
Describe the bug
See https://github.com/datafusion-contrib/datafusion-functions-json/pull/3
I have a union defined by
DataType::Union(
UnionFields::new(
vec![0, 1, 2, 3, 4, 5, 6],
vec![
Field::new("null", DataType::Boolean, true),
Field::new("bool", DataType::Boolean, false),
Field::new("int", DataType::Int64, false),
Field::new("float", DataType::Float64, false),
Field::new("string", DataType::Utf8, false),
Field::new("array", DataType::Utf8, false),
Field::new("object", DataType::Utf8, false),
]
),
UnionMode::Sparse,
)
When I try to compare it to an integer with json_get(json_data, 'foo')=123, I get the error:
called `Result::unwrap()` on an `Err` value: Plan("Cannot infer common argument type for comparison operation Union([(0, Field { name: \"null\", data_type: Boolean, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }), (1, Field { name: \"bool\", data_type: Boolean, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }), (2, Field { name: \"int\", data_type: Int64, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }), (3, Field { name: \"float\", data_type: Float64, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }), (4, Field { name: \"string\", data_type: Utf8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }), (5, Field { name: \"array\", data_type: Utf8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }), (6, Field { name: \"object\", data_type: Utf8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} })], Sparse) = Int64")
Is there any way to rewire the logic plan to tell datafusion how to implement these comparisons?
If not, I might have to abandon the json_get method, and instead implemtn json_get_str, json_get_int etc., which would be unfortunate.
I tried implementing FunctionRewrite, but the error occurs before it's called.
To Reproduce
see tests in https://github.com/datafusion-contrib/datafusion-functions-json/pull/3
Expected behavior
No response
Additional context
No response
I took a look a bit, and I found that your return_type is Union, but If I understand correctly, you should compute the return type based on args. For example, in your test json_get(json_data, 'foo'), if you compute the return_type based on foo, you return Int, then you will not meet the error.
The error is because Union is not yet supported in comparison_coercion
@jayzhan211 that doesn't work since the argument types don't tell you what type will be returned.
e.g.:
- if the value in column
foois{"x": "abc"}, thenjson_get(foo, 'x')will return a string - but if the value in column
foois{"x": 123}, thenjson_get(foo, 'x')will return an integer
However I think I have a work around, I'm requiring a cast, so you have to do json_get(foo, 'x')::string or json_get(foo, 'x')::int, then I'm using a FunctionRewrite to rewrite the function from json_get to json_get_str or json_get_int.
With that the only remaining issue is making the error less ugly and more informative.
@samuelcolvin Did you also consider return_type_from_exprs, if it does not work, I think we can either support Union in comparison_coercion or better design ScalarUDFImpl to do more customize about the return type.
ye, return_type_from_exprs doesn't help.
I got around it mostly, with I think good performance by rewriting the query when there's a cast, so:
select * from foo where json_get(attributes, 'bar')::string='ham'
Will be rewritten to:
select * from foo where json_get_str(attributes, 'bar')='ham'
The main requirement I have now is that the error when you do try to compare a union is more helpful and less ugly.
The main requirement I have now is that the error when you do try to compare a union is more helpful and less ugly.
Do you mean the error message in comparision_coerion?
called `Result::unwrap()` on an `Err` value: Plan("Cannot infer common argument type for comparison operation Union([(0, Field { name: \"null\", data_type: Boolean, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }), (1, Field { name: \"bool\", data_type: Boolean, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }), (2, Field { name: \"int\", data_type: Int64, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }), (3, Field { name: \"float\", data_type: Float64, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }), (4, Field { name: \"string\", data_type: Utf8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }), (5, Field { name: \"array\", data_type: Utf8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }), (6, Field { name: \"object\", data_type: Utf8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} })], Sparse) = Int64")
sorry for the slow reply, yes exactly.
I'm working on this, initially in arrow-rs.