Align unnamed expressions naming with Snowflake

Open rampage644 opened this issue 6 months ago • 1 comments

Currently, embucket relies on datafusion style naming schema for queries which is not aligned with Snowflake. Example (note case and naming):

 > select system$typeof(2 / 3), 2 / 3;
+---------------------------------+
| SYSTEM$TYPEOF(2 / 3) | 2 / 3    |
|----------------------+----------|
| NUMBER(7,6)[SB4]     | 0.666667 |
+---------------------------------+

 > select arrow_typeof(2 / 3), 2 / 3;
+---------------------------------------------------------+
| arrow_typeof(Int64(2) / Int64(3)) | Int64(2) / Int64(3) |
|-----------------------------------+---------------------|
| Int64                             | 0                   |
+---------------------------------------------------------+

Jun 17 '25 00:06 rampage644

The default logic to convert expr to field name is here https://github.com/apache/datafusion/blob/main/datafusion/expr/src/logical_plan/plan.rs#L2185-L2195

So we have some options

Update display format here https://github.com/apache/datafusion/blob/main/datafusion/expr/src/expr.rs#L2671
Change default ScalarUDFImpl schema_name method

pub trait ScalarUDFImpl: Debug + Send + Sync {
    /// Returns the name of the column this expression would create
    ///
    /// See [`Expr::schema_name`] for details
    fn schema_name(&self, args: &[Expr]) -> Result<String> {
        Ok(format!(
            "{}({})",
            self.name(),
            schema_name_from_exprs_comma_separated_without_space(args)?
        ))
    }

Add visitor to add alias if missing with correct name like

Expr::Alias(Alias {
    expr: Box::new(Expr::ScalarFunction { ... }),
    name: "my_pretty_name".to_string(),
    relation: None,
})

Add OptimizerRule/AnalyzerRule
Convert result Dataframe schema in the end

@Vedin @rampage644 @ravlio WDYT?

Jun 18 '25 08:06 osipovartem