mimir
mimir copied to clipboard
Replace UDFs/UDAs with Spark's Catalog
At present, User-defined functions (UDFs) and User-defined aggregates (UDAs) can be defined either in Mimir-land or in Spark-land. Moreover,
- Spark's UDA/UDF catalog implementation is virtually identical to Mimir's
- There's a mountain of libraries that already support spark
- Function and aggregate management is a non-trivial 1k lines of code (or more).
I propose that we defer to Spark's catalog to cut out a ton of redundant code from Mimir. This would require the following changes:
- RAToSpark: Could now directly use the Spark catalog to instantiate functions (see the new MimirSQL for a few examples on how this might work)
- Typechecker: Would need to use Spark's catalog to check types. This could get a little awkward, since Spark's and Mimir's typesystems differ. Would probably require RAToSQL to handle some translations.
- Eval / EvalInline: Would now talk Spark for function execution