terraform-provider-kafka-connect icon indicating copy to clipboard operation
terraform-provider-kafka-connect copied to clipboard

Provider doesn't exit when a kafka-connect cluster is on 100% CPU utilization

Open dimisjim opened this issue 4 years ago • 5 comments

Provider does not exit when a kafka-connect cluster is unresponsive, leading to endless

[TRACE] dag/walk: vertex x is waiting for y

messages

This happens when I target multiple connectors of the same cluster, but not when I am targeting only 1 or a few of them.

dimisjim avatar Dec 20 '20 17:12 dimisjim

There seems to be a threshold on the amount of connector resources where this happens.

If I target all of them, the provider endlessly keeps the plan going, but in a few occasions, it succeeds.

If I target 70% of them, it succeeds all the time.

Any clues on what might be the problem?

dimisjim avatar Dec 21 '20 09:12 dimisjim

I can second this. Have this issue when running v0.2.3, do you get this issue on v0.2.2? Also how many connectors are you running? We have 128 connectors we are trying to check via terraform plan.

pbr0ck3r avatar Feb 10 '21 04:02 pbr0ck3r

Seems to get to around 100 connectors then hang.

pbr0ck3r avatar Feb 10 '21 18:02 pbr0ck3r

In my case it was around 70. But CPU was also 100%

dimisjim avatar Feb 10 '21 18:02 dimisjim

@Mongey do you have any thoughts on if this is a issue with code or possibly a issue/limitation from the terraform side (or machine CPU/Memory related)? If you can help point me in the right direction, I wouldn't mind spending some time on trying to dig into this and maybe create a PR if possible.

pbr0ck3r avatar Mar 10 '21 00:03 pbr0ck3r