beam icon indicating copy to clipboard operation
beam copied to clipboard

[Failing Test]: JmsIOTest. testCheckpointMark flaky

Open Abacn opened this issue 1 year ago • 1 comments

What happened?

Example run: https://github.com/apache/beam/runs/21250090063 (an PR unrelated to Jms):

java.lang.AssertionError
	at org.junit.Assert.fail(Assert.java:87)
	at org.junit.Assert.assertTrue(Assert.java:42)
	at org.junit.Assert.assertTrue(Assert.java:53)
	at org.apache.beam.sdk.io.jms.JmsIOTest.testCheckpointMark(JmsIOTest.java:463)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

Fails at here: https://github.com/apache/beam/blob/d5aa44c9ba9eb910774d789dd4182a5d25d8f552/sdks/java/io/jms/src/test/java/org/apache/beam/sdk/io/jms/JmsIOTest.java#L463

In fact, consumer.receiveNoWait call at https://github.com/apache/beam/blob/d5aa44c9ba9eb910774d789dd4182a5d25d8f552/sdks/java/io/jms/src/main/java/org/apache/beam/sdk/io/jms/JmsIO.java#L559 never guarantees a message will be returned when there are still unacked message on the server: https://stackoverflow.com/questions/36626634/does-jms-receivenowait-guarantee-message-delivery-when-messages-are-available

So there is a chance that the call returns null and fails assert

Issue Failure

Failure: Test is flaky

Issue Priority

Priority: 2 (backlog / disabled test but we think the product is healthy)

Issue Components

  • [ ] Component: Python SDK
  • [ ] Component: Java SDK
  • [ ] Component: Go SDK
  • [ ] Component: Typescript SDK
  • [ ] Component: IO connector
  • [ ] Component: Beam YAML
  • [ ] Component: Beam examples
  • [ ] Component: Beam playground
  • [ ] Component: Beam katas
  • [ ] Component: Website
  • [ ] Component: Spark Runner
  • [ ] Component: Flink Runner
  • [ ] Component: Samza Runner
  • [ ] Component: Twister2 Runner
  • [ ] Component: Hazelcast Jet Runner
  • [ ] Component: Google Cloud Dataflow Runner

Abacn avatar Feb 06 '24 05:02 Abacn

The underlying cause is that there is no guarantee receiveNoWait here: https://github.com/apache/beam/blob/27f1c0774fd93e846de9a8b668e6effc5a41eb10/sdks/java/io/jms/src/main/java/org/apache/beam/sdk/io/jms/JmsIO.java#L559 return a nonnull value when there is pending record in the server side, per JMS specification. This can also affects integration test for the same reason, as we see not all records are read within timeout.

Abacn avatar Feb 07 '24 00:02 Abacn