superset
superset copied to clipboard
[sqllab] Tableschemaview fails to load with presto using parquet format files
Summary
On Superset version 3.1.0 the table schema previews fail to load on sqllab/ and dataset/add/ paths when using presto datasources and parquet format files.
These were working for Superset version 1.5.3, but have not worked since version 2.0.1 and up to 3.1.0.
Error condition on Superset version 3.1.0 with presto table using format=PARQUET
Conditions
- This only occurs when those tables are using Presto and a table
format = 'PARQUET' - Other database types such as Druid and MySQL do not exhhibit this behaviour
- This occurs when logged in as an Admin user, therefore I believe that it is likely unrelated to #25451
- When the presto table is using
format = 'TEXTFILE'this error does not occur
How to reproduce the bug
- Go to Superset version 3.1.0 as an Admin user and navigate to either of:
a)
/sqllabb)/dataset/add - Select a database that uses the
prestoconnector type - Select any schema
- Select any table that uses a
format = 'PARQUET'
Expected results
I would expect the left hand column to be populated with the column names from the selected schema.
Actual results
Several error messages appear stating that there were errors fetching table metadata and the left-hand column is not populated.
Error messages in the server log
There are no relevant error messages in the server log.
We can see the pyhive presto command going through:
INFO:pyhive.presto:SHOW COLUMNS FROM "wmf"."aqs_hourly"
INFO:pyhive.presto:SHOW COLUMNS FROM "wmf"."aqs_hourly"
INFO:pyhive.presto:SHOW COLUMNS FROM "wmf"."aqs_hourly"
INFO:pyhive.presto:SELECT * FROM wmf."aqs_hourly$partitions"
ORDER BY year DESC, month DESC, day DESC, hour DESC
LIMIT 1
We have a lot of DEBUG level messages from requests_kerberos.kerberos_ and urllib3.connectionpool and spnego._gss while the request is authenticated and processed, but these appear to show a 401 followed by a successful 200 response.
There are no stack traces shown.
Additional Screenshots
Error condition on Superset version 2.1.1 with presto table using format=PARQUET
No error on Superset version 1.5.3 for the same table
No error on Superset version 2.1.1 with presto table using format=TEXTFILE
Environment
- browser type and version: Firefox 118.0.1 (64-bit) on Linux, but this affects other browsers.
- superset version:
3.1.0 - python version:
3.9.2 - node.js version:
16 - any feature flags active:
ENABLE_TEMPLATE_PROCESSINGDASHBOARD_NATIVE_FILTERSENABLE_FILTER_BOX_MIGRATION
- metadata caching, memcached
Metadata database: MariaDB 10.4 Presto version 0.283
Optional components:
pyhive[kerberos,presto]==0.7.0
gunicorn[gevent]
apache-superset[hive,presto,mysql,druid,trino,spark,postgres]
pylibmc==1.6.1
Checklist
Make sure to follow these steps before submitting your issue - thank you!
- [x] I have checked the superset logs for python stacktraces and included it here as text if there are any.
- [x] I have reproduced the issue with at least the latest released version of superset.
- [x] I have checked the issue tracker for the same issue and I haven't found one similar.
Additional context
I have updated this issue based on my testing against Superset version 3.1.0
The issue is still preset in this version and is preventing us from upgrading our production instance from 1.5.3 to 3.1.0.
Assuming this is also present in 4.0 then, but that might be worth confirming.
I also wonder if this has been addressed at all by @betodealmeida 's catalog work and/or @dpgaspar 's parquet refactoring, both of which have only recently been merged to master (i.e. not released) yet.
Re-opening because the fix was reverted in https://github.com/apache/superset/pull/28613.