superset icon indicating copy to clipboard operation
superset copied to clipboard

Presto template processor functions are not available in Trino

Open aaronfeng opened this issue 3 years ago • 3 comments

Hi,

We are in the process of migrating from Presto to Trino and noticed template processor functions don't work with Trino. We have a lot of datasets using first_latest_partition function.

How to reproduce the bug

Below is working for presto driver, however, it doesn't work for Trino.

SELECT *
FROM foo 
WHERE ds='{{ presto.first_latest_partition('foo') }}'

Expected results

Template processor function should also work for Trino.

Actual results

When using Trino driver, an error is thrown because there are not template processor functions for Trino.

https://github.com/apache/superset/blob/master/superset/jinja_context.py#L553

Screenshots

Environment

  • superset version: 1.3.0

I looked at master code, it seems like this will also be an issue for 1.4.0

Checklist

Make sure to follow these steps before submitting your issue - thank you!

  • [x] I have checked the superset logs for python stacktraces and included it here as text if there are any.
  • [x] I have reproduced the issue with at least the latest released version of superset.
  • [x] I have checked the issue tracker for the same issue and I haven't found one similar.

aaronfeng avatar Mar 28 '22 19:03 aaronfeng

@aaronfeng any luck with this?

pashkash avatar Sep 07 '22 14:09 pashkash

@pashkash no, the workaround is to use

max(ds)

but I believe that will scan all partitions.

aaronfeng avatar Sep 22 '22 18:09 aaronfeng

Maybe @bkyryliuk @nytai @villebro can confirm if this is an Issue we ought to keep open or close as inactive/deprioritized?

rusackas avatar Feb 12 '24 20:02 rusackas

This is at risk of being closed as stale, but meanwhile, hopefully @dosu-bot can give us a little added insight.

rusackas avatar Jun 10 '24 19:06 rusackas

Hey @rusackas! I'm Dosu, a bot here to assist you with bugs, questions, and contributions while you're waiting for a human maintainer. Let's squash those bugs together! 🤖

To make the first_latest_partition template processor function work with Trino in Superset, you can use the TrinoTemplateProcessor class. This class ensures compatibility with Trino by inheriting from PrestoTemplateProcessor and setting up the context correctly for Trino.

Here's how you can set it up:

  1. Define the TrinoTemplateProcessor class:

    class TrinoTemplateProcessor(PrestoTemplateProcessor):
        engine = "trino"
    
        def process_template(self, sql: str, **kwargs: Any) -> str:
            template = self.env.from_string(sql)
            kwargs.update(self._context)
    
            # Backwards compatibility if migrating from Presto.
            context = validate_template_context(self.engine, kwargs)
            context["presto"] = context["trino"]
            return template.render(context)
    
  2. Update the DEFAULT_PROCESSORS dictionary in your Superset configuration:

    DEFAULT_PROCESSORS = {
        "presto": PrestoTemplateProcessor,
        "hive": HiveTemplateProcessor,
        "spark": SparkTemplateProcessor,
        "trino": TrinoTemplateProcessor,
    }
    
  3. Ensure that the TrinoEngineSpec class handles the latest_partition functionality. This is already implemented in the TrinoEngineSpec class:

    class TrinoEngineSpec(PrestoBaseEngineSpec):
        engine = "trino"
        engine_name = "Trino"
        allows_alias_to_source_column = False
    
        @classmethod
        def get_extra_table_metadata(
            cls,
            database: Database,
            table: Table,
        ) -> dict[str, Any]:
            metadata = {}
    
            if indexes := database.get_indexes(table):
                col_names, latest_parts = cls.latest_partition(
                    database,
                    table,
                    show_first=True,
                    indexes=indexes,
                )
    
                if not latest_parts:
                    latest_parts = tuple([None] * len(col_names))
    
                metadata["partitions"] = {
                    "cols": sorted(
                        list(
                            {
                                column_name
                                for index in indexes
                                if index.get("name") == "partition"
                                for column_name in index.get("column_names", [])
                            }
                        )
                    ),
                    "latest": dict(zip(col_names, latest_parts)),
                    "partitionQuery": cls._partition_query(
                        table=table,
                        indexes=indexes,
                        database=database,
                    ),
                }
    
            if database.has_view(Table(table.table, table.schema)):
                with database.get_inspector(
                    catalog=table.catalog,
                    schema=table.schema,
                ) as inspector:
                    metadata["view"] = inspector.get_view_definition(
                        table.table,
                        table.schema,
                    )
    
            return metadata
    
  4. Testing: Ensure that the template processor is tested for Trino. You can refer to the integration tests to see how the template processor is tested for different backends, including Trino:

    def test_template_trino(app_context: AppContext, mocker: MockerFixture) -> None:
        lp_mock = mocker.patch(
            "superset.jinja_context.TrinoTemplateProcessor.latest_partition"
        )
        lp_mock.return_value = "the_latest"
        database = mock.Mock()
        database.backend = "trino"
        template = "{{ trino.latest_partition('my_table') }}"
        tp = get_template_processor(database=database)
        assert tp.process_template(template) == "the_latest"
    
        # Backwards compatibility if migrating from Presto.
        template = "{{ presto.latest_partition('my_table') }}"
        tp = get_template_processor(database=database)
        assert tp.process_template(template) == "the_latest"
    

By following these steps, the first_latest_partition function should work with Trino in the same way it works with Presto [1][2][3].

To continue talking to Dosu, mention @dosu.

dosubot[bot] avatar Jun 10 '24 19:06 dosubot[bot]