flink icon indicating copy to clipboard operation
flink copied to clipboard

[FLINK-29198][test] Fail after maximum RetryOnException

Open RyanSkraba opened this issue 3 years ago • 3 comments

What is the purpose of the change

This pull request causes tests annotated with @RetryOnException to fail after the given number of retries when used with the JUnit5 RetryExtension. This is the old behaviour we saw with the JUnit4 RetryRule.

Currently, the test is retried for the maximum number of executions, but the "expected" exception is never thrown even on the last try. This causes the test to appear skipped instead of failed.

Brief change log

  • If the test doesn't succeed on the last possible retry, the exception is propagated as a test failure.
  • I reverted some overeager JUnit5 migration on test cases for the JUnit4 RetryRule. This code is appropriately duplicated for RetryExtension.

Verifying this change

This change is already covered by existing tests, such as RetryOnExceptionExtensionTest.

I've manually tested that the following test now fails after the expected number of retries (instead of being skipped):

    @TestTemplate
    @RetryOnException(times = NUMBER_OF_RUNS, exception = IllegalArgumentException.class)
    void testThrowExceptionForever() {
        throw new IllegalArgumentException();
    }

In JUnit5, it's very tricky to specify that a test is supposed to fail, but it's possible. I started down this route with a technique similar to ExpectToFail but it ends up needing to be very aware of the internals of our RetryExtension. I'm not sure it's worth adding test extension code to test extension code, but I can continue down that route!

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): no
  • The public API, i.e., is any changed class annotated with @Public(Evolving): no
  • The serializers: no
  • The runtime per-record code paths (performance sensitive): no
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
  • The S3 file system connector: no

Documentation

  • Does this pull request introduce a new feature? no
  • If yes, how is the feature documented? not applicable

RyanSkraba avatar Sep 09 '22 17:09 RyanSkraba

CI report:

  • fa2109787c10a06fc4e5f04e22859ffd597b3863 Azure: SUCCESS
Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

flinkbot avatar Sep 09 '22 17:09 flinkbot

Thanks for the contribution. I think it would be great to have a test here confirming that the change fixes the issue

snuyanzin avatar Sep 21 '22 14:09 snuyanzin

Thanks for the contribution. I think it would be great to have a test here confirming that the change fixes the issue

Thanks Sergey, especially for the collaboration! This was a tricky thing to test; it gets a bit unclear about how far to go when testing test code :D But since this is such a widely used TestExtension, it's good to have extra coverage -- especially when the symptom is otherwise silently skipping tests that should have failed.

RyanSkraba avatar Sep 22 '22 06:09 RyanSkraba

@flinkbot run azure

snuyanzin avatar Sep 26 '22 07:09 snuyanzin

lgtm from my side

//cc @XComp

snuyanzin avatar Sep 28 '22 09:09 snuyanzin

Sorry for the delay! I've merge/squashed all of the commits (with @snuyanzin now as co-author, thanks!). The only changes were 1 javadoc clarification and the small requested simplification in the unit test. Can you take a look?

RyanSkraba avatar Oct 17 '22 14:10 RyanSkraba

Rebased again and edited the commit messages!

RyanSkraba avatar Oct 20 '22 07:10 RyanSkraba

Rebased again and moved some modifications to their own separate commit

RyanSkraba avatar Oct 21 '22 15:10 RyanSkraba

Rebased and changed the commit messages to the suggested text. Thanks for your attention!

In fact, for testing correctness, I would suggest that all three commits be cherry-picked back into branch-1.16

RyanSkraba avatar Nov 03 '22 08:11 RyanSkraba

@flinkbot run azure

XComp avatar Nov 03 '22 15:11 XComp

Failure caused by FLINK-24119

XComp avatar Nov 04 '22 08:11 XComp

@flinkbot run azure

XComp avatar Nov 04 '22 08:11 XComp

@flinkbot run azure

XComp avatar Nov 04 '22 10:11 XComp