datafusion icon indicating copy to clipboard operation
datafusion copied to clipboard

"Cannot infer common argument type for comparison operation Union..."

Open samuelcolvin opened this issue 1 year ago • 5 comments

Describe the bug

See https://github.com/datafusion-contrib/datafusion-functions-json/pull/3

I have a union defined by

        DataType::Union(
            UnionFields::new(
                vec![0, 1, 2, 3, 4, 5, 6],
                vec![
                    Field::new("null", DataType::Boolean, true),
                    Field::new("bool", DataType::Boolean, false),
                    Field::new("int", DataType::Int64, false),
                    Field::new("float", DataType::Float64, false),
                    Field::new("string", DataType::Utf8, false),
                    Field::new("array", DataType::Utf8, false),
                    Field::new("object", DataType::Utf8, false),
                ]
            ),
            UnionMode::Sparse,
        )

When I try to compare it to an integer with json_get(json_data, 'foo')=123, I get the error:

called `Result::unwrap()` on an `Err` value: Plan("Cannot infer common argument type for comparison operation Union([(0, Field { name: \"null\", data_type: Boolean, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }), (1, Field { name: \"bool\", data_type: Boolean, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }), (2, Field { name: \"int\", data_type: Int64, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }), (3, Field { name: \"float\", data_type: Float64, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }), (4, Field { name: \"string\", data_type: Utf8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }), (5, Field { name: \"array\", data_type: Utf8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }), (6, Field { name: \"object\", data_type: Utf8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} })], Sparse) = Int64")

Is there any way to rewire the logic plan to tell datafusion how to implement these comparisons?

If not, I might have to abandon the json_get method, and instead implemtn json_get_str, json_get_int etc., which would be unfortunate.

I tried implementing FunctionRewrite, but the error occurs before it's called.

To Reproduce

see tests in https://github.com/datafusion-contrib/datafusion-functions-json/pull/3

Expected behavior

No response

Additional context

No response

samuelcolvin avatar Apr 22 '24 18:04 samuelcolvin

I took a look a bit, and I found that your return_type is Union, but If I understand correctly, you should compute the return type based on args. For example, in your test json_get(json_data, 'foo'), if you compute the return_type based on foo, you return Int, then you will not meet the error.

The error is because Union is not yet supported in comparison_coercion

jayzhan211 avatar Apr 23 '24 14:04 jayzhan211

@jayzhan211 that doesn't work since the argument types don't tell you what type will be returned.

e.g.:

  • if the value in column foo is {"x": "abc"}, then json_get(foo, 'x') will return a string
  • but if the value in column foo is {"x": 123}, then json_get(foo, 'x') will return an integer

However I think I have a work around, I'm requiring a cast, so you have to do json_get(foo, 'x')::string or json_get(foo, 'x')::int, then I'm using a FunctionRewrite to rewrite the function from json_get to json_get_str or json_get_int.

With that the only remaining issue is making the error less ugly and more informative.

samuelcolvin avatar Apr 23 '24 14:04 samuelcolvin

@samuelcolvin Did you also consider return_type_from_exprs, if it does not work, I think we can either support Union in comparison_coercion or better design ScalarUDFImpl to do more customize about the return type.

jayzhan211 avatar Apr 24 '24 00:04 jayzhan211

ye, return_type_from_exprs doesn't help.

I got around it mostly, with I think good performance by rewriting the query when there's a cast, so:

select * from foo where json_get(attributes, 'bar')::string='ham'

Will be rewritten to:

select * from foo where json_get_str(attributes, 'bar')='ham'

The main requirement I have now is that the error when you do try to compare a union is more helpful and less ugly.

samuelcolvin avatar Apr 24 '24 11:04 samuelcolvin

The main requirement I have now is that the error when you do try to compare a union is more helpful and less ugly.

Do you mean the error message in comparision_coerion?

called `Result::unwrap()` on an `Err` value: Plan("Cannot infer common argument type for comparison operation Union([(0, Field { name: \"null\", data_type: Boolean, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }), (1, Field { name: \"bool\", data_type: Boolean, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }), (2, Field { name: \"int\", data_type: Int64, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }), (3, Field { name: \"float\", data_type: Float64, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }), (4, Field { name: \"string\", data_type: Utf8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }), (5, Field { name: \"array\", data_type: Utf8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }), (6, Field { name: \"object\", data_type: Utf8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} })], Sparse) = Int64")

jayzhan211 avatar Apr 24 '24 13:04 jayzhan211

sorry for the slow reply, yes exactly.

samuelcolvin avatar Aug 08 '24 16:08 samuelcolvin

I'm working on this, initially in arrow-rs.

samuelcolvin avatar Aug 09 '24 11:08 samuelcolvin