superset icon indicating copy to clipboard operation
superset copied to clipboard

[sqllab] Tableschemaview fails to load with presto using parquet format files

Open tullis opened this issue 2 years ago • 3 comments

Summary

On Superset version 3.1.0 the table schema previews fail to load on sqllab/ and dataset/add/ paths when using presto datasources and parquet format files.

These were working for Superset version 1.5.3, but have not worked since version 2.0.1 and up to 3.1.0.

Error condition on Superset version 3.1.0 with presto table using format=PARQUET

Conditions

  • This only occurs when those tables are using Presto and a table format = 'PARQUET'
  • Other database types such as Druid and MySQL do not exhhibit this behaviour
  • This occurs when logged in as an Admin user, therefore I believe that it is likely unrelated to #25451
  • When the presto table is using format = 'TEXTFILE' this error does not occur

How to reproduce the bug

  1. Go to Superset version 3.1.0 as an Admin user and navigate to either of: a) /sqllab b) /dataset/add
  2. Select a database that uses the presto connector type
  3. Select any schema
  4. Select any table that uses a format = 'PARQUET'

Expected results

I would expect the left hand column to be populated with the column names from the selected schema.

Actual results

Several error messages appear stating that there were errors fetching table metadata and the left-hand column is not populated.

Error messages in the server log

There are no relevant error messages in the server log.

We can see the pyhive presto command going through:

INFO:pyhive.presto:SHOW COLUMNS FROM "wmf"."aqs_hourly"
INFO:pyhive.presto:SHOW COLUMNS FROM "wmf"."aqs_hourly"
INFO:pyhive.presto:SHOW COLUMNS FROM "wmf"."aqs_hourly"
INFO:pyhive.presto:SELECT * FROM wmf."aqs_hourly$partitions"
ORDER BY year DESC, month DESC, day DESC, hour DESC
LIMIT 1

We have a lot of DEBUG level messages from requests_kerberos.kerberos_ and urllib3.connectionpool and spnego._gss while the request is authenticated and processed, but these appear to show a 401 followed by a successful 200 response.

There are no stack traces shown.

Additional Screenshots

Error condition on Superset version 2.1.1 with presto table using format=PARQUET

No error on Superset version 1.5.3 for the same table

No error on Superset version 2.1.1 with presto table using format=TEXTFILE

Environment

  • browser type and version: Firefox 118.0.1 (64-bit) on Linux, but this affects other browsers.
  • superset version: 3.1.0
  • python version: 3.9.2
  • node.js version: 16
  • any feature flags active:
    • ENABLE_TEMPLATE_PROCESSING
    • DASHBOARD_NATIVE_FILTERS
    • ENABLE_FILTER_BOX_MIGRATION
  • metadata caching, memcached

Metadata database: MariaDB 10.4 Presto version 0.283

Optional components:

pyhive[kerberos,presto]==0.7.0
gunicorn[gevent]
apache-superset[hive,presto,mysql,druid,trino,spark,postgres]
pylibmc==1.6.1

Checklist

Make sure to follow these steps before submitting your issue - thank you!

  • [x] I have checked the superset logs for python stacktraces and included it here as text if there are any.
  • [x] I have reproduced the issue with at least the latest released version of superset.
  • [x] I have checked the issue tracker for the same issue and I haven't found one similar.

Additional context

tullis avatar Oct 13 '23 11:10 tullis

I have updated this issue based on my testing against Superset version 3.1.0

The issue is still preset in this version and is preventing us from upgrading our production instance from 1.5.3 to 3.1.0.

tullis avatar Jan 23 '24 14:01 tullis

Assuming this is also present in 4.0 then, but that might be worth confirming.

I also wonder if this has been addressed at all by @betodealmeida 's catalog work and/or @dpgaspar 's parquet refactoring, both of which have only recently been merged to master (i.e. not released) yet.

rusackas avatar May 13 '24 20:05 rusackas

Re-opening because the fix was reverted in https://github.com/apache/superset/pull/28613.

john-bodley avatar May 21 '24 18:05 john-bodley