strimzi-kafka-operator icon indicating copy to clipboard operation
strimzi-kafka-operator copied to clipboard

[Spike] Checking if replacing Topic Operator Kafka Streams storage with self-managed Kafka persistence passes system tests

Open LiamClarkeNZ opened this issue 3 years ago • 5 comments

Type of change

Select the type of your PR

  • Refactoring

Description

The TO's usage of Kafka Streams has led to repeated bug reports, hopefully replacing Kafka Streams with a simpler implementation (that still persists to Kafka) will be an improvement. But, I want to run the system tests against these code changes, somewhere other than my laptop.

Checklist

Please go through this checklist and make sure all applicable tasks have been done

  • [ ] Write tests
  • [ ] Make sure all tests pass
  • [ ] Update documentation
  • [ ] Check RBAC rights for Kubernetes / OpenShift roles
  • [ ] Try your changes from Pod inside your Kubernetes and OpenShift cluster, not just locally
  • [ ] Reference relevant issue(s) and close them after merging
  • [ ] Update CHANGELOG.md
  • [ ] Supply screenshots for visual changes, such as Grafana dashboards

LiamClarkeNZ avatar Jul 18 '22 03:07 LiamClarkeNZ

/azp run regression

scholzj avatar Jul 18 '22 08:07 scholzj

Azure Pipelines successfully started running 1 pipeline(s).

azure-pipelines[bot] avatar Jul 18 '22 08:07 azure-pipelines[bot]

@LiamClarkeNZ these changes work fine on my local environment (minikube) and on a remote cluster (OCP 4.10). I ran some manual tests and the reproducer described in ENTMQST-4087 without any issue.

I also had a peak at the implementation and I have a couple of suggestions:

  • The internal topics naming is confusing. I would propose __strimzi_topic_commands and __strimzi_topic_states, which better reflect their content.
  • Session.stop is never called in case of graceful shutdown, so we don't wait for inflight reconciliations. You can fix by invoking Vertx.close from a shutdown hook.

Some system tests are failing.

[ERROR] Errors: 
[ERROR]   TopicST.testCreateDeleteCreate:240 » Kafka Failed to create new KafkaAdminClie...
[ERROR]   TopicST.testCreateTopicViaAdminClient:170 » Kafka Failed to create new KafkaAd...
[ERROR]   TopicST.testDeleteTopicEnableFalse:378 » NullPointer
[INFO] 
[ERROR] Tests run: 8, Failures: 0, Errors: 3, Skipped: 2

fvaleri avatar Jul 26 '22 17:07 fvaleri

Thanks very much @fvaleri :) I'll have a look into those STs and see what's failing, and will incorporate your feedback into further changes. That Session.stop issue must be affecting the current implementation too, as I integrated with the existing approach of a collection of Closeable elements that are closed in Session.stop

LiamClarkeNZ avatar Jul 26 '22 19:07 LiamClarkeNZ

@LiamClarkeNZ The Session.stop issue is addressed in https://github.com/strimzi/strimzi-kafka-operator/pull/7165, so you don't need to worry about that.

fvaleri avatar Aug 06 '22 16:08 fvaleri

Kia Ora @LiamClarkeNZ. Thanks for this, I know it has been open for a long time. I've started working on the TO lately, with a view to making it KRaft compatible. It's essentially a rewrite. As part of that I'm planning to use a compacted topic and in-memory Map for holding a different representation of topic state. It should be possible to back-port that to ZK-based TO. Therefore I think it's better to wait for that than end up forcing people into two store migrations.

tombentley avatar Dec 02 '22 08:12 tombentley