redash
redash copied to clipboard
Athena schema refresh failure due to timeout
Issue Summary
The Athena schema refresh fails due to a worker timeout of 30 secs. I've checked the relevant query and it is taking around 500 sec as there lot of DBs, tables, and a lot of queries running at a time on Athena. However, in the data source settings, I've mentioned the schema_name (instead of default) hence in case of an on-demand refresh (i.e from UI) redash should query for schema for that specific schema_name as this query runs much faster.
Current Query:
SELECT table_schema, table_name, column_name
FROM information_schema.columns
WHERE table_schema NOT IN ('information_schema')
The above query can be changed to something like:
SELECT table_schema, table_name, column_name
FROM information_schema.columns
WHERE table_schema = '<schema_name>'
Reference: https://github.com/getredash/redash/blob/master/redash/query_runner/athena.py
This also fixes the case when one table has a schema issue(i.e having multiple columns with the same name etc.) leading the complete schema refresh to fail.
Technical details:
- Redash Version: 8.0.2
- Browser/OS: Any
- How did you install Redash: Docker
Any workarounds for this issue?
hey @kartheek7895 any workaround you found to fix this issue ??? Please Share was struck with same
hey @srinivas-engg-sre , @kartheek7895 did you find any workaround to fix this issue ? I am facing the same problem.
rq job default timeout is 180 seconds for automatic refreshes. There is another limit on manual refreshes (when you click the on the UI) which is
300 seconds. These can be confirmed by your Athena console "Recent queries" tab.
To workaround thses limits, I had to update the source code:
- Manual refreshes:
redash/tasks/general.py
-@job("schemas", queue_class=Queue, at_front=True, timeout=300, ttl=90)
+@job("schemas", queue_class=Queue, at_front=True, timeout=1300, ttl=90)
- Automatic refreshes:
redash/tasks/queries/maintenance.py
-@job("schemas")
+@job("schemas", timeout=1300)
A better way is of course making it an env var.
If you don't want to update the source code, you can always use the Glue data catalog instead.
Since it uses boto3 to traverse all dbs/tables/columns, it's much faster than sending a plain query to Athena.