mimir icon indicating copy to clipboard operation
mimir copied to clipboard

Replace UDFs/UDAs with Spark's Catalog

Open okennedy opened this issue 5 years ago • 0 comments

At present, User-defined functions (UDFs) and User-defined aggregates (UDAs) can be defined either in Mimir-land or in Spark-land. Moreover,

  1. Spark's UDA/UDF catalog implementation is virtually identical to Mimir's
  2. There's a mountain of libraries that already support spark
  3. Function and aggregate management is a non-trivial 1k lines of code (or more).

I propose that we defer to Spark's catalog to cut out a ton of redundant code from Mimir. This would require the following changes:

  1. RAToSpark: Could now directly use the Spark catalog to instantiate functions (see the new MimirSQL for a few examples on how this might work)
  2. Typechecker: Would need to use Spark's catalog to check types. This could get a little awkward, since Spark's and Mimir's typesystems differ. Would probably require RAToSQL to handle some translations.
  3. Eval / EvalInline: Would now talk Spark for function execution

okennedy avatar Dec 19 '19 22:12 okennedy