Steampipe hangs reliably under some conditions
Hello,
Following the discussion on Slack in this thread, and as suggested by @cbruno10, I'm logging an issue.
I can reliably make Steampipe hang given:
- Steampipe v0.20.7 ; did not test other versions.
- The use of the csv plugin v0.9.0 configured with two files ; no hang with v0.8.0.
- A specific poorly written query -- its result is non-deterministic ; no hang when the query is deterministic, see below.
cache = true(https://steampipe.io/docs/guides/caching#server-level-cache-settings) ; no hang when cache is disabled.- Launched interactively, without a service, through
steampipe query ./queries/tmp.sql.
Unfortunately I can't reduce the CSV files enough to publish them on Github. One is three lines long, including header, the other one is ~700 lines long. There is in both 10-15 columns. When I try to trim the files more than that, I can't reliably get Steampipe to hang anymore for some reason.
The CSV files are of poor quality, some fields are quoted, not others. I have duplicate datas in all columns, and I suspect that there is an issue with the cache logic somehow around this.
The pseudonymized query is like this:
WITH tmp_table as (
SELECT
"AccountId" as account_id,
"BusinessGroup" as business_group
FROM
csv."MYCSVfile"
)
SELECT
acclist.id as account_id,
(SELECT business_group from tmp_table WHERE tmp_table.account_id=acclist.id LIMIT 1) as bg
FROM
csv."accounts" acclist /* list of accounts updated by generateSteampipeConfig.py */
Non-deterministic subquery:
- When I replace
SELECT business_group from tmp_table WHERE tmp_table.account_id=acclist.id LIMIT 1bySELECT MAX(business_group) from tmp_table WHERE tmp_table.account_id=acclist.id, then I do not experience the freeze. - Alternatively, if I replace
WITH tmp_table as (byWITH tmp_table as MATERLIZED (, then I do not experience the freeze.
I have stashed an acceptance test locally (on this model) that reproduces the issue, if you'd like me to try another build. It runs on a debian machine and I run it through ~/steampipe/tests/acceptance$ ./run-local.sh ajoga.bats.
The hanging is such that even to get back my shell when I launch the test, I have to run $ killall steampipe && killall postgres && killall steampipe-plugin-csv.plugin in an other shell (advise welcome on how to tune bats to handle that ; export BATS_TEST_TIMEOUT=20 in the setup_file of the bats file didn't do it)
the CSV files are of poor quality, some fields are quoted, not others.
I wonder if a preprocessing step, for example using the Python module's QUOTE_ALL, would make a difference.
WITH tmp_table as MATERIALIZED
Interesting. There are cases where a seemingly unnecessary ORDER BY (https://github.com/turbot/steampipe/issues/968) influences the query planner in ways that fix an otherwise failing query. We suspect that AS MATERIALIZED can have a similar influence in some cases.