dask-sql icon indicating copy to clipboard operation
dask-sql copied to clipboard

[ENH]: Add extension support to allow swapping in own SqlParser and rel Plugins.

Open brightsparc opened this issue 3 years ago • 0 comments

Is your feature request related to a problem? Please describe. I am interested in building my own custom SqlParser and Plugins, similar to the way the custom machine learning features have been added to the core library.

There are currently no easy extension points due to the DaskSqlParser being a concrete class with a no way of override in the DEFAULT_CONFIG

public class DaskSqlParser {
    private SqlParser.Config DEFAULT_CONFIG;

    public DaskSqlParser() {
        DEFAULT_CONFIG = DaskSqlDialect.DEFAULT.configureParser(SqlParser.Config.DEFAULT)
            .withConformance(SqlConformanceEnum.DEFAULT)
            .withParserFactory(new DaskSqlParserImplFactory()); 
    }

    public SqlNode parse(String sql) throws SqlParseException {
        final SqlParser parser = SqlParser.create(sql, DEFAULT_CONFIG); 
        final SqlNode sqlNode = parser.parseStmt();
        return sqlNode;
    }
}

Further to this dask sql Context wires up all the custom relational algebra Plugins in the constructor which target specific java classes eg "com.dask.sql.parser.SqlAnalyzeTable" eg:

        RelConverter.add_plugin_class(custom.AnalyzeTablePlugin, replace=False)

Describe the solution you'd like A clear and concise description of what you want to happen.

I would like to see:

  1. Support for configuring a custom java SqlParserFactory
  2. Support for configuring the custom ML plugins that are loaded in the context.

Suggest adding two new arguments to the Context constructor, which allows specifying the default schema name, and parser factory which could be set on this class. Splitting out the plugin registration into a seperate methods, which you could override by extending the default context to include own method or provide an alternative location to replace the default custom import eg:

    DEFAULT_SCHEMA_NAME = "root"
    DEFAULT_PARSER_FACTORY = "com.dask.sql.application.DaskSqlParserImplFactory"

    def __init__(self, schema_name=self.DEFAULT_SCHEMA_NAME, parser_factory=self.DEFAULT_PARSER_FACTORY):
        """
        Create a new context.
        """
        # Name of the root schema
        self.schema_name = schema_name
        # All schema information
        self.schema = {self.schema_name: SchemaContainer(self.schema_name)}
        # A started SQL server (useful for jupyter notebooks)
        self.sql_server = None
        # Set the parser factory
        self.parser_factory = parser_factory
        # Register any default plugins, if nothing was registered before.
       self.register_default_plugins()
       self.register_custom_plugins()

  def register_custom_plugins(self):
        RelConverter.add_plugin_class(custom.AnalyzeTablePlugin, replace=False)

Update the [RelationalAlgebraGeneratorBuilder](https://github.com/dask-contrib/dask-sql/blob/main/dask_sql/context.py#L793-L795_ doesn't allow passing a new class that implements org.apache.calcite.sql.parse.SqlParserImplFactory

    def _get_ral(self, sql):
        """Helper function to turn the sql query into a relational algebra and resulting column names"""
        # get the schema of what we currently have registered
        schemas = self._prepare_schemas()

        RelationalAlgebraGeneratorBuilder = (
            com.dask.sql.application.RelationalAlgebraGeneratorBuilder
        )

        # True if the SQL query should be case sensitive and False otherwise
        case_sensitive = dask_config.get("sql.identifier.case_sensitive", default=True)

        generator_builder = RelationalAlgebraGeneratorBuilder(
            self.schema_name, case_sensitive, java.util.ArrayList(), self.parser_factory
        )

Describe alternatives you've considered Alternate solutions would be split out the planner java code and custom rel logic into a seperate contrib library to decouple the dependency altogether.

Additional context Add any other context, code examples, or references to existing implementations about the feature request here.

brightsparc avatar Mar 08 '22 03:03 brightsparc