terraform-provider-kafka
terraform-provider-kafka copied to clipboard
Add ACL cache and debouncing request queues for creation/deletion requests to make parallelism useful
When creating/destroying 1000 ACLs:
On a real cluster with ~80k ACLs
Provides 89000% speedup for ACL creations (~8000s -> ~9s)
On a fresh cluster with no ACLs yet:
Provides a 780% speedup for ACL creations and 1470% speedup for ACL deletions .
Before:
time terraform apply -auto-approve -parallelism=1000
25.98s user 18.43s system 83% cpu 53.286 total
time terraform destroy -parallelism=1000 -auto-approve
39.98s user 19.88s system 32% cpu 3:04.59 total
After:
time terraform apply -parallelism=1000 -auto-approve
25.81s user 15.20s system 603% cpu 6.798 total
time terraform destroy -parallelism=1000 -auto-approve
30.12s user 13.22s system 344% cpu 12.570 total
For anyone facing the same issue, I've published a release here until this PR is reviewed/merged: https://registry.terraform.io/providers/eh-steve/kafka/latest
Not sure if the CI failures are legit, as I have tests passing locally...?
I've test your PR and seems something wrong with it. After plan I've got an error:
│ Error: kafka: client has run out of available brokers to talk to: 5 errors occurred:
│ * EOF
│ * EOF
│ * EOF
│ * EOF
│ * EOF
Same error in the pipeline tests https://github.com/Mongey/terraform-provider-kafka/actions/runs/7386689599/job/20093815910?pr=357
Without this patch all works good
@SanchosPancho I'd been running make testacc
locally against the docker-compose'd brokers which was working.
When running testacc
under docker, it does indeed fail, but seemingly from being unable to reach the brokers on localhost:9092
. It seems like the env var for KAFKA_BOOTSTRAP_SERVERS
wasn't correctly set or used in the Makefile, so the provider wasn't using the internal advertised listeners (kafka-{1..3}:9090
), instead attempting to use the external localhost:9092, which wouldn't be reachable inside the testacc container without setting network: host
...
I've made changes in https://github.com/Mongey/terraform-provider-kafka/pull/357/commits/be82e9b46e6d3bba5d7dedf9df232a62af5d3469 which should allow the testacc container to reach the brokers via the docker-compose network on 9090 (this would let the old CircleCI test approach work again).
I've just been running
docker compose -f docker-compose.yaml -f docker-compose.testacc.yaml up --abort-on-container-exit testacc
successfully on my end.
I think some of the tests are already racy anyway, but I've run this 5 times now and got 5 passes in a row. Could you try on your side?
Also sounds like some overlap with https://github.com/Mongey/terraform-provider-kafka/issues/383 which is nothing to do with this PR?
@eh-steve I've try it on real cluster, but I think that depends on kafka version, I've fail on old 2.3+ kafka cluster seems provider 0.6 doesn't work correctly with old kafka instances
Yeah I guess if you wanted to test this PR you could locally cherry pick the first commit onto an older master.
I've also released it in the registry under eh-steve/kafka
version 0.5.5 so you could test with that. The parent master commit for that release is 993494b, which may have already introduced the problem though?
Hey @eh-steve , Thanks for the above optimizations. Does the above reduce the time execution, on regular terraform plan commands for clusters that have tons of ACLs ?
BTW is there any help you need, to push forward with this PR?
Hey @eh-steve , Thanks for the above optimizations. Does the above reduce the time execution, on regular terraform plan commands for clusters that have tons of ACLs ?
Yeah it should
BTW is there any help you need, to push forward with this PR?
Just waiting for an approval and a CI run
Hi @Mongey, thanks for the run, I can’t reproduce these CI failures locally and one of them seems to be related to GPG? Is master CI passing?
hey @eh-steve thanks for the PR! looks great! The build snapshots never really works for PR's due to GPG, I generally just ignore that failure.
I pulled the branch down locally and tested -- it seems to work for me. Not sure what's going on with CI - it is quite flakey