datafusion icon indicating copy to clipboard operation
datafusion copied to clipboard

Implement protobuf serialization for LogicalPlan::Unnest

Open jamesmcm opened this issue 1 year ago • 0 comments

Is your feature request related to a problem or challenge?

This job in Ballista fails:

    let avro_file = "gs://...";

    let metadata_df = ctx
        .read_avro(avro_file, AvroReadOptions::default())
        .await?
        .select(vec![col("id"), col("nested_array")])?
        .unnest_column("nested_array")?
Error: DataFusionError(Internal("failed to serialize logical plan: Internal(\"LogicalPlan serde is not yet implemented for Unnest\")"))

Due to the lack of ProtoBuf serialisation here - https://github.com/apache/datafusion/blob/main/datafusion/proto/src/logical_plan/mod.rs#L1527-L1529

Describe the solution you'd like

ProtoBuf Serialisation would be added so that Unnest will be usable in Ballista jobs.

Describe alternatives you've considered

I also couldn't get taking just the head to work:

    let metadata_df = ctx
        .read_avro(avro_file, AvroReadOptions::default())
        .await?
        .select(vec![
            col("id"),
            col("nested_array")
                .index(lit(1)) // Note index starts at 1
                .field("array_id_1")
                .alias("array_id_1"),
            col("nested_array")
                .index(lit(1))
                .field("array_id_2")
                .alias("array_id_2"),
        ])?;

which gave:

Error: DataFusionError(Plan("Only ints are valid as an indexed field in a list"))

But really unnesting support is needed anyway.

Additional context

No response

jamesmcm avatar May 23 '24 16:05 jamesmcm