pluralith-cli icon indicating copy to clipboard operation
pluralith-cli copied to clipboard

Pluralith cli hangs on step "Generating Diagram"

Open hawk-stephan-stiefel opened this issue 3 years ago • 18 comments

Pluralith Version:

→ CLI Version: 0.1.17 → Graph Module Version: 0.1.13

Steps to reproduce:

  • Followed the steps described here: https://docs.pluralith.com/docs/get-started/run-in-ci/
  • pluralith run plan leads to

image

  • Stays like this forever and never finish/terminates

Things I noticed:

  • pluralith.state.json is quite huge, 74508 lines
  • htop shows a 100% usage for the pluralith cli process. Means it utilize one complete core (but not more)
  • When using the MacOS Client it quits after a while. From the pluralith.graphlogs.json I can find a few of those lines:

image

image

I am happy to provide more input if necessary

hawk-stephan-stiefel avatar Oct 24 '22 15:10 hawk-stephan-stiefel

Hey @hawk-stephan-stiefel, thanks for trying Pluralith! Looks like the graphing algorithm is failing on your project. Could you please run pluralith strip on the project, check the output and send it to us if you think it's safe? The strip command hashes all keys and values ​​in the json plan and allows us to debug the graphing algo using the hashed plan. Please send the hashed plan to [email protected] Thank you!

PhiWeber avatar Oct 24 '22 15:10 PhiWeber

Wow, that's a huge project you got there @hawk-stephan-stiefel ! I guess our stuff isn't quite ready yet for scales like these haha

DanThePutzer avatar Oct 24 '22 15:10 DanThePutzer

Is there a way to exclude certain things? Like VPCs and certain resources that should not be graphed etc?

hawk-stephan-stiefel avatar Oct 24 '22 16:10 hawk-stephan-stiefel

You still need the pluralith strip? Or is our project in general too huge?

hawk-stephan-stiefel avatar Oct 24 '22 16:10 hawk-stephan-stiefel

@hawk-stephan-stiefel we don't have a way to exclude things at the moment

The strip output would still be very helpful for improving our algorithm to perform better on large projects. Would be awesome if you could share it 👍

DanThePutzer avatar Oct 24 '22 17:10 DanThePutzer

Hey @DanThePutzer @PhiWeber unfortunately I am not able to share the the strip output as its too huge actually to 100% confirm it does not contain any sensitive information. Can I assist in any other way?

hawk-stephan-stiefel avatar Nov 21 '22 11:11 hawk-stephan-stiefel

@hawk-stephan-stiefel yeah it can get quite intensive to check larger state files line by line, no worries at all!

I think we'll have to wait for the next releases and see if a fix comes along organically. @PhiWeber is working on some improvements based on other, very large state files we've received. There's a chance the issue you're facing overlaps with those occurring with similarly big states 👍

DanThePutzer avatar Nov 21 '22 14:11 DanThePutzer

I'm having a similar issue, large projects causing the app to hang indefinitely. Going to see if stripped state is safe to send. Assuming you still want samples @PhiWeber?

jamesearl avatar Nov 29 '22 15:11 jamesearl

Ack, can't do it because it looks like the schema argument of bigquery_table resources is not being hashed. https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/bigquery_table

jamesearl avatar Nov 29 '22 15:11 jamesearl

@jamesearl thanks for checking! We're always extending the strip command so thanks for the heads up. @PhiWeber will get back to you about a fix for the missing hash on that bigquery_table on here in a bit 👍

DanThePutzer avatar Nov 29 '22 17:11 DanThePutzer

@jamesearl a fix for this bug (hashing schemas) will be included in the next CLI release. I will update you again here when we release the new version of the CLI. Thank you for reporting this bug!

PhiWeber avatar Nov 29 '22 18:11 PhiWeber

Hi @jamesearl we just published a new release for the CLI that should solve this. We improved the stripping for string values, let us know if the issue with the bigquery_table still persists.

DanThePutzer avatar Dec 01 '22 09:12 DanThePutzer

@hawk-stephan-stiefel We also published a new release of the graphing algorithm that has some fixes specifically for large complex states, so it might include a fix for your issue. Would you mind giving it another run?

DanThePutzer avatar Dec 01 '22 09:12 DanThePutzer

@DanThePutzer I upgraded to the latest version:

_ |)| _ _ |.|| | |||| (||| | | |

→ CLI Version: 0.1.21 → Graph Module Version: 0.1.15

the command pluralith run plan runs already for 15 min. Not sure if stuck or still doing something. CPU usage is 100% on one core for the process pluralith-cli-graphing

image

image

Our pluralith.state.json is already grown to 82648 lines.

Would the command timeout someday? Or do I get somewhere else some information? Like a log file where i can see the current progress?

hawk-stephan-stiefel avatar Dec 02 '22 16:12 hawk-stephan-stiefel

Hi @hawk-stephan-stiefel thanks for giving it another shot!

It seems we solved the issue that caused it to fail, but the state is still very complex and takes the algorithm way too long!

We'll need to improve it further on large states to fix that issue.

We haven't yet added a timeout, but it's on our list! We'll make it spit out some logs when the timeout hits.

Once we added the timeout I'll report back here :+1:

DanThePutzer avatar Dec 02 '22 16:12 DanThePutzer

After 30 min I get following error:

image

hawk-stephan-stiefel avatar Dec 02 '22 17:12 hawk-stephan-stiefel

Haha thanks for sitting it through, I wonder why the cache isn't there. I'll dig into it and report back!

DanThePutzer avatar Dec 02 '22 17:12 DanThePutzer

Hello Everyone.

I think i got a similar issue. This is a very large terraform repository (VPC, EKS, helm charts, etc)

I'm running directly to avoid running the entire plan:

/Users/csepulveda/Pluralith/bin/pluralith-cli-graphing graph --api-key 6dc738a21b7db10a9df492079c8a9fe9 \
--title "R2 Infra" --branch dev --author csepulveda  --ver "0.1" --file-name r2.pdf \
--out-dir /Users/csepulveda/repos/sre/terrafom-base-infra \
--plan-json-path /Users/csepulveda/repos/sre/terrafom-base-infra/.pluralith/pluralith.state.hashed \
--cost-json-path /Users/csepulveda/repos/sre/terrafom-base-infra/.pluralith/pluralith.costs.json \
--export-pdf true --sync-to-backend false --show-changes false --show-drift false --show-costs false \
--wsl false --cost-mode delta --cost-period month --run-id 8437448 \
--project-id skin-xxxxx-pale-xxxx --org-id 3760xxxxx
IDENTIFIED PROVIDERS:  [ 'aws', 'kubernetes', 'random', 'helm', 'null', 'tls' ]
Promise { <pending> }
Error: graphing algo timeout
Graph result corrupted: Preprocessing: OKAY,Processing: OKAY

I'm trying to generate the file r2.pdf. but i got this Error: graphing algo timeout after 6 or 7 minutes.

I'm already trying to run this from my macOS, and also from docker using alpine and also node:latest but always i got a timeout generating the graph.

This are the plan files, hashed and the no hashed one.:

~✗ wc -l pluralith.state.hashed 
  112089 pluralith.state.hashed
~✗ wc -l pluralith.state.json  
  112087 pluralith.state.json

csepulveda avatar Feb 15 '23 18:02 csepulveda