kafka-junit icon indicating copy to clipboard operation
kafka-junit copied to clipboard

Ability to simulate Kafka failure

Open vincentfree opened this issue 4 years ago • 8 comments

I'm using Kafka-unit for test through Kafka but i also want to simulate a failing stack to test my resilience using Kafka.

Could you add a method to kill Kafka in such a way that a consumer / producer marks this as actual cluster failure?

A method to bring back the stack would the be great to see how an application recovers

vincentfree avatar Dec 04 '19 14:12 vincentfree

Hi @vincentfree ,

Does this test case do what you're looking for? Or is there some other functionality specifically you're looking for?

https://github.com/salesforce/kafka-junit/blob/master/kafka-junit-core/src/test/java/com/salesforce/kafka/test/KafkaTestClusterTest.java#L235-L329

Crim avatar Dec 05 '19 01:12 Crim

This does about the same as I have now which is cleanly shutdown a broker. This is nice because I can test a upgrade scenario or patch scenario but what I also want to do is to have a un clean shutdown of a broker. I had some resilience problems in the past with brokers going down with OOM errors and the like, I don't manage our own kafka cluster so my application gets notice by failing 😅.

Normally this should be managed by the consumers/producers the selfs but that failed in my case.

I want to be able to reenact such failure with brokers, especially the controller node failing.

vincentfree avatar Dec 11 '19 07:12 vincentfree

Hmm, no such method exists today, tho it is an interesting use case. I'll poke around and see if I can come up with anything.

The underlying KafkaServerStartable doesn't provide much access to work with, so it may require bypassing that and interacting directly with it's underlying KafkaServer instance....

Crim avatar Dec 11 '19 08:12 Crim

It would be great to be able to do so and after killing a server bringing it back up with the same signature for the cluster after some time. You can then test the resilience, Kafka's election process under failure, reelection when a server comes back up and impact on consumers and producers while this all happens.

vincentfree avatar Dec 11 '19 08:12 vincentfree

I wanted to do same test the behaviour when a broker is down.

I tried to do:

sharedKafkaTestResource.getKafkaBrokers().getBrokerById(1).stop();

And then

sharedKafkaTestResource.getKafkaBrokers().getBrokerById(1).start();

But the stop() method seems to be asynchronous, as a result it's impossible to know when the test can continue. After calling KafkaServerStartable#shutdown, I need to wait for shutdown using KafkaServerStartable#awaitShutdown https://github.com/salesforce/kafka-junit/blob/v3.2.1/kafka-junit-core/src/main/java/com/salesforce/kafka/test/KafkaTestServer.java#L307

gquintana avatar Apr 30 '20 13:04 gquintana

The difference with your approach is that I would want a abrupt shutdown without any notice. This would ensure that my applications will use their resiliency functions to handle the problem, either by buffering using default fallbacks or differently.

For your problem, do you get any type of future back or are you able to set a callback?

vincentfree avatar Apr 30 '20 20:04 vincentfree

Both sound like valid use cases to test against. Unfortunately I don't believe Kafka exposes mechanisms to do what you're looking for.

Regarding async shutdowns, it may be possible to block until shutdown is complete similar to how we block waiting for startup here

Crim avatar Apr 30 '20 21:04 Crim

My issue is probably different from @vincentfree . I am roughly doing the same as this unit test: https://github.com/salesforce/kafka-junit/blob/7d4d70a533cf0c95e2828338179e3524bfa03c6a/kafka-junit-core/src/test/java/com/salesforce/kafka/test/KafkaTestServerTest.java#L317 I'll have to investigate where the difference lies.

gquintana avatar May 01 '20 12:05 gquintana