embucket-labs
embucket-labs copied to clipboard
Align unnamed expressions naming with Snowflake
Currently, embucket relies on datafusion style naming schema for queries which is not aligned with Snowflake. Example (note case and naming):
> select system$typeof(2 / 3), 2 / 3;
+---------------------------------+
| SYSTEM$TYPEOF(2 / 3) | 2 / 3 |
|----------------------+----------|
| NUMBER(7,6)[SB4] | 0.666667 |
+---------------------------------+
vs
> select arrow_typeof(2 / 3), 2 / 3;
+---------------------------------------------------------+
| arrow_typeof(Int64(2) / Int64(3)) | Int64(2) / Int64(3) |
|-----------------------------------+---------------------|
| Int64 | 0 |
+---------------------------------------------------------+
The default logic to convert expr to field name is here https://github.com/apache/datafusion/blob/main/datafusion/expr/src/logical_plan/plan.rs#L2185-L2195
So we have some options
- Update display format here https://github.com/apache/datafusion/blob/main/datafusion/expr/src/expr.rs#L2671
- Change default ScalarUDFImpl schema_name method
pub trait ScalarUDFImpl: Debug + Send + Sync {
/// Returns the name of the column this expression would create
///
/// See [`Expr::schema_name`] for details
fn schema_name(&self, args: &[Expr]) -> Result<String> {
Ok(format!(
"{}({})",
self.name(),
schema_name_from_exprs_comma_separated_without_space(args)?
))
}
- Add visitor to add alias if missing with correct name like
Expr::Alias(Alias {
expr: Box::new(Expr::ScalarFunction { ... }),
name: "my_pretty_name".to_string(),
relation: None,
})
- Add OptimizerRule/AnalyzerRule
- Convert result Dataframe schema in the end
@Vedin @rampage644 @ravlio WDYT?