datafusion-comet icon indicating copy to clipboard operation
datafusion-comet copied to clipboard

Implement initial version of to_json

Open andygrove opened this issue 1 year ago • 6 comments

What is the problem the feature request solves?

Now that we have support for CreateNamedStruct in https://github.com/apache/datafusion-comet/pull/620, we could start working on to_json functionality. This functionality is very similar to our existing logic for casting from various data types to string but with JSON formatting.

Describe the potential solution

No response

Additional context

No response

andygrove avatar Jul 05 '24 15:07 andygrove

Hello @andygrove I would like to work on this issue. Can you please assign it to me

jatin510 avatar Jul 07 '24 20:07 jatin510

Thanks @jatin510 . Assigned to you.

viirya avatar Jul 07 '24 21:07 viirya

@jatin510 I am willing to be a co author on this PR. Is that fine ?

Can by solving for below query ?

select to_json(named_struct(expression1_name, expression1_input[, ..., expression_n_name, expression_n_input]))

dharanad avatar Jul 09 '24 05:07 dharanad

@andygrove QQ: Upon checking i found out that DataFusion doesn't currently support a built-in to_json function. While implementing it directly in Comet is an option, there might be a more efficient approach by implementing the function in datafusion and reusing it here. What are your thoughts on these considerations?

dharanad avatar Jul 09 '24 16:07 dharanad

@andygrove QQ: Upon checking i found out that DataFusion doesn't currently support a built-in to_json function. While implementing it directly in Comet is an option, there might be a more efficient approach by implementing the function in datafusion and reusing it here. What are your thoughts on these considerations?

@dharanad It makes sense to add this in DataFusion. I see that Postgres also supports to_json.

Spark supports a number of options in to_json to control how dates and times are formatted. As long as those options are also available in the DataFusion version then I think we should be able to reuse it directly.

andygrove avatar Jul 15 '24 16:07 andygrove

I am working on an initial implementation of this and will have a PR up soon. This should make it easy for others to contribute to flesh out the functionality more. I will create this in Comet first since it has a lot of Spark-specific logic but maybe we can find a way to abstract that out so that we can upstream the bulk of the feature.

andygrove avatar Aug 10 '24 22:08 andygrove