flink icon indicating copy to clipboard operation
flink copied to clipboard

[hotfix] Fixed flaky-test: StateTtlHintTest#testJoinStateTtlHintWithView

Open hermya opened this issue 1 year ago • 1 comments

What is the purpose of the change

This pull request identifies a flaky-test and proposes a fix.

Brief change log

The following method asserts a flaky test: org.apache.flink.table.planner.plan.hints.stream.StateTtlHintTest#testJoinStateTtlHintWithView

Failed assertion:

expected: <
Join(joinType=[InnerJoin], where=[=(a1, a3)], select=[a1, b1, a3, b3], leftInputSpec=[NoUniqueKey], rightInputSpec=[NoUniqueKey], stateTtlHints=[[[STATE_TTL options:{LEFT=2d, RIGHT=1d}]]])
:- Exchange(distribution=[hash[a1]])
:  +- TableSourceScan(table=[[default_catalog, default_database, T1]], fields=[a1, b1])
+- Exchange(distribution=[hash[a3]])
   +- TableSourceScan(table=[[default_catalog, default_database, T3]], fields=[a3, b3])
> but was: <
Join(joinType=[InnerJoin], where=[=(a1, a3)], select=[a1, b1, a3, b3], leftInputSpec=[NoUniqueKey], rightInputSpec=[NoUniqueKey], stateTtlHints=[[[STATE_TTL options:{RIGHT=1d, LEFT=2d}]]])
:- Exchange(distribution=[hash[a1]])
:  +- TableSourceScan(table=[[default_catalog, default_database, T1]], fields=[a1, b1])
+- Exchange(distribution=[hash[a3]])
   +- TableSourceScan(table=[[default_catalog, default_database, T3]], fields=[a3, b3])
>

Precisely (String comparison failure)

Expected: stateTtlHints=[[[STATE_TTL options:{LEFT=2d, RIGHT=1d}]]] 
Actual: stateTtlHints=[[[STATE_TTL options:{RIGHT=1d, LEFT=2d}]]]

FAILURE REASON

RelExplainUtil.scala => hintsToString uses ImmutableMap.toString() function to convert kvOptions to string representation. Map.toString() is unreliable and gives non-deterministic output where sorting of Entries depends on current hashcode generation logic, and could be different for each toString call.

Fix

Added a method that sorts entries in lexographical order and returns same result as map.toString(). Sorting makes it deterministic.

Confirmed test failure using nondex

For particular test:

mvn -pl ./flink-table/flink-table-planner edu.illinois:nondex-maven-plugin:2.1.7:nondex -Dtest=org.apache.flink.table.planner.plan.hints.stream.StateTtlHintTest#testJoinStateTtlHintWithView -DnondexRuns=10 -Dspotless.check.skip=true

For entire repo

Suggested command to check flaky tests :

mvn nondex:nondex

For more information : https://github.com/TestingResearchIllinois/NonDex

Verifying this change

Please make sure both new and modified tests in this PR follow the conventions for tests defined in our code quality guide.

This change added tests and can be verified as follows:

  • Added fixes in code to generate deterministic output for non-flaky (deterministic) assertion

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): no
  • The public API, i.e., is any changed class annotated with @Public(Evolving): no
  • The serializers: no
  • The runtime per-record code paths (performance sensitive): no
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
  • The S3 file system connector: no

hermya avatar Oct 09 '24 00:10 hermya

CI report:

  • 1b289036cc1731c6ae08f25913b155fb0b774402 Azure: FAILURE
Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

flinkbot avatar Oct 09 '24 00:10 flinkbot