steampipe-plugin-sdk icon indicating copy to clipboard operation
steampipe-plugin-sdk copied to clipboard

Steampipe hangs reliably under some conditions

Open ajoga opened this issue 2 years ago • 2 comments

Hello,

Following the discussion on Slack in this thread, and as suggested by @cbruno10, I'm logging an issue.

I can reliably make Steampipe hang given:

  • Steampipe v0.20.7 ; did not test other versions.
  • The use of the csv plugin v0.9.0 configured with two files ; no hang with v0.8.0.
  • A specific poorly written query -- its result is non-deterministic ; no hang when the query is deterministic, see below.
  • cache = true (https://steampipe.io/docs/guides/caching#server-level-cache-settings) ; no hang when cache is disabled.
  • Launched interactively, without a service, through steampipe query ./queries/tmp.sql.

Unfortunately I can't reduce the CSV files enough to publish them on Github. One is three lines long, including header, the other one is ~700 lines long. There is in both 10-15 columns. When I try to trim the files more than that, I can't reliably get Steampipe to hang anymore for some reason.

The CSV files are of poor quality, some fields are quoted, not others. I have duplicate datas in all columns, and I suspect that there is an issue with the cache logic somehow around this.

The pseudonymized query is like this:

WITH tmp_table as (
    SELECT
        "AccountId" as account_id,
        "BusinessGroup" as business_group
    FROM
        csv."MYCSVfile"
)
SELECT
    acclist.id as account_id,
    (SELECT business_group from tmp_table WHERE tmp_table.account_id=acclist.id LIMIT 1) as bg
FROM
    csv."accounts" acclist /* list of accounts updated by generateSteampipeConfig.py */

Non-deterministic subquery:

  1. When I replace SELECT business_group from tmp_table WHERE tmp_table.account_id=acclist.id LIMIT 1 by SELECT MAX(business_group) from tmp_table WHERE tmp_table.account_id=acclist.id, then I do not experience the freeze.
  2. Alternatively, if I replace WITH tmp_table as ( by WITH tmp_table as MATERLIZED (, then I do not experience the freeze.

I have stashed an acceptance test locally (on this model) that reproduces the issue, if you'd like me to try another build. It runs on a debian machine and I run it through ~/steampipe/tests/acceptance$ ./run-local.sh ajoga.bats. The hanging is such that even to get back my shell when I launch the test, I have to run $ killall steampipe && killall postgres && killall steampipe-plugin-csv.plugin in an other shell (advise welcome on how to tune bats to handle that ; export BATS_TEST_TIMEOUT=20 in the setup_file of the bats file didn't do it)

ajoga avatar Jun 28 '23 13:06 ajoga

the CSV files are of poor quality, some fields are quoted, not others.

I wonder if a preprocessing step, for example using the Python module's QUOTE_ALL, would make a difference.

WITH tmp_table as MATERIALIZED

Interesting. There are cases where a seemingly unnecessary ORDER BY (https://github.com/turbot/steampipe/issues/968) influences the query planner in ways that fix an otherwise failing query. We suspect that AS MATERIALIZED can have a similar influence in some cases.

judell avatar Jun 28 '23 16:06 judell