sqlalchemy-trino icon indicating copy to clipboard operation
sqlalchemy-trino copied to clipboard

Superset trino iceberg use case "ValueError: too many values to unpack (expected 2)"

Open RacekM opened this issue 3 years ago • 3 comments

Hi,

we are using Trino with Superset and Iceberg to process and persist our data. We found out that when we use data backed by Iceberg, which's schema contains a Timestamp type field then Superset is unable to download its schema. It fails at https://github.com/dungdm93/sqlalchemy-trino/blob/5a01b488697a9467d778e6d94e9b3878b91b6d9c/sqlalchemy_trino/datatype.py#L145.

In Superset logs, I can see this error.

ERROR:root:too many values to unpack (expected 2)                                                                      
Traceback (most recent call last):         
  File "/usr/local/lib/python3.8/site-packages/flask_appbuilder/api/__init__.py", line 84, in wraps
    return f(self, *args, **kwargs)                                                                                    
  File "/usr/local/lib/python3.8/site-packages/superset/views/base_api.py", line 80, in wraps                          
    duration, response = time_function(f, self, *args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/superset/utils/core.py", line 1368, in time_function
    response = func(*args, **kwargs)                       
  File "/usr/local/lib/python3.8/site-packages/superset/utils/log.py", line 224, in wrapper
    value = f(*args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/superset/databases/api.py", line 489, in table_metadata
    table_info = get_table_metadata(database, table_name, schema_name)
  File "/usr/local/lib/python3.8/site-packages/superset/databases/utils.py", line 73, in get_table_metadata
    indexes = get_indexes_metadata(database, table_name, schema_name)
  File "/usr/local/lib/python3.8/site-packages/superset/databases/utils.py", line 38, in get_indexes_metadata
    indexes = database.get_indexes(table_name, schema_name) 
  File "/usr/local/lib/python3.8/site-packages/superset/models/core.py", line 624, in get_indexes
    indexes = self.inspector.get_indexes(table_name, schema)
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/reflection.py", line 513, in get_indexes
    return self.dialect.get_indexes(
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy_trino/dialect.py", line 192, in get_indexes
    partitioned_columns = self._get_columns(connection, f'{table_name}$partitions', schema, **kw)
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy_trino/dialect.py", line 118, in _get_columns
    type=datatype.parse_sqltype(record.data_type),
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy_trino/datatype.py", line 145, in parse_sqltype
    name, attr_type_str = split(attr_str.strip(), delimiter=' ')
ValueError: too many values to unpack (expected 2)

I am quite sure that there is a problem with a line https://github.com/dungdm93/sqlalchemy-trino/blob/5a01b488697a9467d778e6d94e9b3878b91b6d9c/sqlalchemy_trino/datatype.py#L145 .

Cause I tried to run SQL query for retrieving data types and I find out that there are data types that are not handled correctly. image

Explicitly there is data type row(min timestamp(6) with time zone, max timestamp(6) with time zone, null_count bigint) which is split by ',' character and you get something like "min timestamp(6) with time zone" what you are trying to split by ' ' character into two attributes.

Do you have any suggestions on how to solve it?

RacekM avatar May 14 '21 12:05 RacekM

You could monkeypatch the file for sqlalchemy_trino/datatype.py changing the line with the issue

    elif type_name == "row":
        attr_types: Dict[str, SQLType] = {}
        for attr_str in split(type_opts):
            outputs = list(split(attr_str.strip(), delimiter=' '))
            name, attr_type_str = outputs[:2]
            attr_type = parse_sqltype(attr_type_str)
            attr_types[name] = attr_type

This seems to work fine and tests keep passing

fcomuniz avatar Aug 18 '21 20:08 fcomuniz

We are running into this as well. When the type name contains spaces, the result of split is no longer a valid tuple. In our case, we have a column of type timestamp(6) with time zone that is triggering this. A table with an identical schema, except for that one column, doesn't encounter this. EDIT: I just found the PR associated with this. Thanks for your work @fcomuniz.

cccs-tom avatar Sep 10 '21 19:09 cccs-tom

Hello @cccs-tom, @fcomuniz, @RacekM Sorry to let all you guys waiting. I create PR #33 aim to fix all known issue of parse func.

dungdm93 avatar Sep 23 '21 16:09 dungdm93