terraform-provider-kafka icon indicating copy to clipboard operation
terraform-provider-kafka copied to clipboard

Add ACL cache and debouncing request queues for creation/deletion requests to make parallelism useful

Open eh-steve opened this issue 1 year ago • 5 comments

When creating/destroying 1000 ACLs:

On a real cluster with ~80k ACLs

Provides 89000% speedup for ACL creations (~8000s -> ~9s)

On a fresh cluster with no ACLs yet:

Provides a 780% speedup for ACL creations and 1470% speedup for ACL deletions .

Before:

time terraform apply -auto-approve -parallelism=1000 
25.98s user 18.43s system 83% cpu 53.286 total

time terraform destroy -parallelism=1000 -auto-approve  
39.98s user 19.88s system 32% cpu 3:04.59 total

After:

time terraform apply -parallelism=1000 -auto-approve
25.81s user 15.20s system 603% cpu 6.798 total

time terraform destroy -parallelism=1000 -auto-approve 
30.12s user 13.22s system 344% cpu 12.570 total

For anyone facing the same issue, I've published a release here until this PR is reviewed/merged: https://registry.terraform.io/providers/eh-steve/kafka/latest

eh-steve avatar Nov 07 '23 17:11 eh-steve

Not sure if the CI failures are legit, as I have tests passing locally...?

eh-steve avatar Jan 02 '24 15:01 eh-steve

I've test your PR and seems something wrong with it. After plan I've got an error:


│ Error: kafka: client has run out of available brokers to talk to: 5 errors occurred:
│ 	* EOF
│ 	* EOF
│ 	* EOF
│ 	* EOF
│ 	* EOF

Same error in the pipeline tests https://github.com/Mongey/terraform-provider-kafka/actions/runs/7386689599/job/20093815910?pr=357

Without this patch all works good

SanchosPancho avatar Feb 15 '24 08:02 SanchosPancho

@SanchosPancho I'd been running make testacc locally against the docker-compose'd brokers which was working.

When running testacc under docker, it does indeed fail, but seemingly from being unable to reach the brokers on localhost:9092. It seems like the env var for KAFKA_BOOTSTRAP_SERVERS wasn't correctly set or used in the Makefile, so the provider wasn't using the internal advertised listeners (kafka-{1..3}:9090), instead attempting to use the external localhost:9092, which wouldn't be reachable inside the testacc container without setting network: host...

I've made changes in https://github.com/Mongey/terraform-provider-kafka/pull/357/commits/be82e9b46e6d3bba5d7dedf9df232a62af5d3469 which should allow the testacc container to reach the brokers via the docker-compose network on 9090 (this would let the old CircleCI test approach work again).

I've just been running

docker compose -f docker-compose.yaml -f docker-compose.testacc.yaml up --abort-on-container-exit testacc

successfully on my end.

I think some of the tests are already racy anyway, but I've run this 5 times now and got 5 passes in a row. Could you try on your side?

Also sounds like some overlap with https://github.com/Mongey/terraform-provider-kafka/issues/383 which is nothing to do with this PR?

eh-steve avatar Feb 19 '24 11:02 eh-steve

@eh-steve I've try it on real cluster, but I think that depends on kafka version, I've fail on old 2.3+ kafka cluster seems provider 0.6 doesn't work correctly with old kafka instances

SanchosPancho avatar Feb 19 '24 13:02 SanchosPancho

Yeah I guess if you wanted to test this PR you could locally cherry pick the first commit onto an older master.

I've also released it in the registry under eh-steve/kafka version 0.5.5 so you could test with that. The parent master commit for that release is 993494b, which may have already introduced the problem though?

eh-steve avatar Feb 19 '24 13:02 eh-steve

Hey @eh-steve , Thanks for the above optimizations. Does the above reduce the time execution, on regular terraform plan commands for clusters that have tons of ACLs ?

BTW is there any help you need, to push forward with this PR?

akaltsikis avatar Mar 25 '24 06:03 akaltsikis

Hey @eh-steve , Thanks for the above optimizations. Does the above reduce the time execution, on regular terraform plan commands for clusters that have tons of ACLs ?

Yeah it should

BTW is there any help you need, to push forward with this PR?

Just waiting for an approval and a CI run

eh-steve avatar Mar 25 '24 10:03 eh-steve

Hi @Mongey, thanks for the run, I can’t reproduce these CI failures locally and one of them seems to be related to GPG? Is master CI passing?

eh-steve avatar Apr 04 '24 10:04 eh-steve

hey @eh-steve thanks for the PR! looks great! The build snapshots never really works for PR's due to GPG, I generally just ignore that failure.

I pulled the branch down locally and tested -- it seems to work for me. Not sure what's going on with CI - it is quite flakey

Mongey avatar Apr 04 '24 10:04 Mongey