hudi icon indicating copy to clipboard operation
hudi copied to clipboard

[MINOR] Fixed unit tests

Open geserdugarov opened this issue 1 year ago • 7 comments

Change Logs

Fixed unit test in TestJavaHoodieBackedMetadata, and TestHoodieDeltaStreamer.

Impact

Fixed unit tests.

Risk level (write none, low medium or high below)

Low.

Contributor's checklist

  • [ ] Read through contributor's guide
  • [ ] Change Logs and Impact were stated clearly
  • [ ] Adequate tests were added if applicable
  • [ ] CI passed

geserdugarov avatar Dec 19 '23 03:12 geserdugarov

I've reverted back changes in TestHoodieTableSource. Couldn't figure out quickly, why there are differences in running on my local machine, work cluster vs Azure pipeline. But the difference in the changed test testBucketPruningSpecialKeyDataType.

Tried even to run locally full copy of the maven command from Azure pipeline "UT FT common & flink & UT client/spark-client":

/usr/bin/mvn -f /home/vsts/work/1/s/pom.xml -fae -Pwarn-log -Dscala-2.12 -Dspark3.2 -Dflink1.18 -Dcheckstyle.skip=true -Drat.skip=true -Djacoco.skip=true -ntp -B -V -Pwarn-log -Dorg.slf4j.simpleLogger.log.org.apache.maven.plugins.shade=warn -Dorg.slf4j.simpleLogger.log.org.apache.maven.plugins.dependency=warn -Punit-tests -pl hudi-common,hudi-flink-datasource,hudi-flink-datasource/hudi-flink,hudi-flink-datasource/hudi-flink1.14.x,hudi-flink-datasource/hudi-flink1.15.x,hudi-flink-datasource/hudi-flink1.16.x,hudi-flink-datasource/hudi-flink1.17.x,hudi-flink-datasource/hudi-flink1.18.x,hudi-client/hudi-spark-client test

but can't reproduce Azure running results.

There is a hung of Azure pipeline "UT FT other modules". Will just try to restart it. But I've checked that in the available log:

[WARNING] Tests run: 5, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 153.622 s - in org.apache.hudi.utilities.deltastreamer.TestHoodieDeltaStreamerWithMultiWriter

which means that fix in TestHoodieDeltaStreamer should be correct.

geserdugarov avatar Dec 19 '23 10:12 geserdugarov

@hudi-bot run azure

geserdugarov avatar Dec 19 '23 10:12 geserdugarov

@hudi-bot run azure

geserdugarov avatar Dec 22 '23 12:12 geserdugarov

I don't understand what is happening with CI. I've changed 2 unit tests:

  • TestJavaHoodieBackedMetadata, from hudi-client/hudi-java-client,
  • TestHoodieDeltaStreamer, from hudi-utilities.

Both are Java tests.

Azure CI

hudi-client/hudi-java-client is not included in the Azure CI. hudi-utilities is included in the Azure CI in UT FT other modules job at UT other modules stage. So, TestHoodieDeltaStreamer test is the only one, which could brake the Azure CI. But the last log from UT other modules stage is

[INFO] Running org.apache.hudi.utilities.sources.TestSqlSource

before

This job was abandoned. We have detected that logs from the agent may have not finished uploading. We have included our in-memory record of all log lines uploaded before we lost contact with the agent:

My change in this test couldn't brake it this way, only test failure is possible. Maybe with my MR test ordering is changed and the unit tests running is hung at @AfterAll/Each of some test class or at @BeforeAll/Each of another one. But I couldn't reproduce the problem locally. This part of CI job is passing without any problem locally.

If the order of running test classes hasn't changed, then from another successful run the order is:

  • 71 another test classes
  • TestHoodieDeltaStreamer
  • 28 another test classes
  • TestGenericRddTransform
  • TestPostgresDebeziumSource
  • TestMysqlDebeziumSource
  • TestGcsEventsHoodieIncrSource
  • TestAvroDFSSource
  • TestSqlSource
  • 39 another test classes.

In my failed Azure CI log the part from TestGenericRddTransform to TestSqlSource is available only. Previous log is missed. If ordering the same then changed TestHoodieDeltaStreamer should be successfully passed, and hung in some another test.

GitHub Actions

My change in TestJavaHoodieBackedMetadata from hudi-client/hudi-java-client should affect only test-hudi-hadoop-mr-and-hudi-java-client job, but not test-spark. And I see that test-hudi-hadoop-mr-and-hudi-java-client is ok, but there are hungs in test-spark and failure at TestDataSourceForBootstrap scala test after

2023-12-23T04:01:07.0996155Z 4017081 [Executor task launch worker for task 372] ERROR org.apache.spark.executor.Executor [] - Exception in task 0.0 in stage 133.0 (TID 372) 2023-12-23T04:01:07.0997116Z java.lang.OutOfMemoryError: GC overhead limit exceeded

@danny0405 , @yihua Could you, please, give me any suggestions what else I can try?

geserdugarov avatar Dec 23 '23 14:12 geserdugarov

@hudi-bot run azure

geserdugarov avatar Dec 25 '23 08:12 geserdugarov

@geser There are some OOM issues on master code that are are trying to fix, should not be related with your change.

danny0405 avatar Dec 26 '23 01:12 danny0405

CI report:

  • cbb9389033bd9e220f170d4c1a6d4ddc227ed649 UNKNOWN
  • d41d05532241bbae3e36b16c2dce5c436133da82 Azure: SUCCESS
Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

hudi-bot avatar Dec 28 '23 14:12 hudi-bot

@yihua : If you are ok with this change, can you land it ?

bvaradar avatar Jan 08 '24 17:01 bvaradar

2023-12-23T04:01:07.0996155Z 4017081 [Executor task launch worker for task 372] ERROR org.apache.spark.executor.Executor [] - Exception in task 0.0 in stage 133.0 (TID 372) 2023-12-23T04:01:07.0997116Z java.lang.OutOfMemoryError: GC overhead limit exceeded

@danny0405 , @yihua Could you, please, give me any suggestions what else I can try?

The OOM looks to be unrelated to this PR, which happens on master too.

yihua avatar Jan 10 '24 16:01 yihua