Create a scalar from array of type Map
Is your feature request related to a problem or challenge?
I am not sure if this is a bug, feature request or things are working as it should.
How can I fix an error of this kind
Background, I have a UDF that mimics json_build_object from postgres. This UDF returns a MapArray as the returning value.
With a table like
let df = ctx.sql(r#"
select * from test"#).await?;
df.show().await?;
+-----------+-----------+-----------+
| clientid | name | parentid |
+-----------+-----------+-----------+
| c-string1 | n-string1 | p-string1 |
| c-string2 | n-string2 | p-string2 |
+-----------+-----------+-----------+
I can build a json like this fine
let df = ctx.sql(r#"
select json_build_object('name', name, 'type', 'test')
from test"#).await?;
df.show().await?;
+---------------------------------------------------------------------+
| json_build_object(Utf8("name"),test.name,Utf8("type"),Utf8("test")) |
+---------------------------------------------------------------------+
| {name: n-string1, type: test} |
| {name: n-string2, type: test} |
+---------------------------------------------------------------------+
But if I remove name or any other column that exist in the table, I get the error
let df = ctx.sql(r#"
select json_build_object('type', 'test')
from test"#).await?;
df.show().await?;
Error: This feature is not implemented: Can't create a scalar from array of type "Map(Field { name: "entries", data_type: Struct([Field { name: "keys", data_type: Utf8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "values", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }]), nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }, false)"
A simple work around will be to always include a column from the table in the query, but unfortunately i cannot guarantee this. Especially when I have use case where json_build_object works on computed values or values created via other udfs.
How can I make my UDF not dependent on the present of a column? For example I can use the make_array function in this way. That is, I can use it without including a column from the table
let df = ctx.sql(r#"
select make_array('type', 'test')
from test"#).await?;
df.show().await?;
+--------------------------------------+
| makearray(Utf8("type"),Utf8("test")) |
+--------------------------------------+
| [type, test] |
| [type, test] |
+--------------------------------------+
Describe the solution you'd like
Ability to have a UDF not dependent on the present of a column, generate as much results in a column as needed in the resulting record batch
Describe alternatives you've considered
if I change the return type to UTF8 instead of Map, that is
let json_build_object_return_type: ReturnTypeFunction = Arc::new(move |_| Ok(Arc::new(DataType::Utf8)));
And i return StringArray from the UDF, I get the behaviour I am looking for. But this means the json structure returned from the UDF get formatted into strings, which makes processing more difficult down the line as I end up with values like this
let value = r#"\"{\\\"key_one\\\":\\\"val_one\\\"}\""#;
Additional context
No response
Maybe related to #11128 ?
Seems similar to make_map udf, and make_map can't support the usage in this issue.
Trying to support it like make_array.
take
@Rachelint are you still working on this? If not I would like to take it.
@Rachelint are you still working on this? If not I would like to take it.
No, I am not working, but it seems similar as #11128? And I have fixed #11128.
If I misunderstand about it, feel free to pick this.