[BUG] ActionSerializerSuite has 3 failing unit tests in the master branch
Bug
Describe the problem
Following unit tests fail:
- Metadata (with all defaults)
- Metadata - json serialization/deserialization
- Metadata with empty createdTime- json serialization/deserialization
Steps to reproduce
git checkout master
build/sbt "testOnly org.apache.spark.sql.delta.ActionSerializerSuite"
Metadata (with all defaults) - json serialization/deserialization *** FAILED *** (623 milliseconds)
[info] "{"metaData":{"id":"[10daabbc-d843-4fe4-abe5-0ccb83ba681e]","format":{"provide..." did not equal "{"metaData":{"id":"[testId]","format":{"provide..." (ActionSerializerSuite.scala:344)
[info] Analysis:
[info] "{"metaData":{"id":"[10daabbc-d843-4fe4-abe5-0ccb83ba681e]","format":{"provide..." -> "{"metaData":{"id":"[testId]","format":{"provide..."
============================================================
[info] Tests: succeeded 25, failed 3, canceled 0, ignored 0, pending 0
[info] *** 3 TESTS FAILED ***
[error] Failed tests:
[error] org.apache.spark.sql.delta.ActionSerializerSuite
[error] (core / Test / testOnly) sbt.TestsFailedException: Tests unsuccessful
[error] Total time: 24 s, completed Jul 21, 2022 9:56:50 AM
➜ delta git:(master)$ build/sbt "testOnly org.apache.spark.sql.delta.ActionSerializerSuite"
Observed results
Expected results
Further details
Environment information
- Delta Lake version:
- Spark version:
- Scala version:
Willingness to contribute
The Delta Lake Community encourages bug fix contributions. Would you or another member of your organization be willing to contribute a fix for this bug to the Delta Lake code base?
- [ ] Yes. I can contribute a fix for this bug independently.
- [ ] Yes. I would be willing to contribute a fix for this bug with guidance from the Delta Lake community.
- [x] No. I cannot contribute a bug fix at this time.
Hmm . is there any way we can set the value for commit id before commit ?. From what i am able to gather it's failing because id is not matching but i think id is random value (most probably SHA-1) So, if we can forcefully set it's value only then we can write a test for it .
The id for metadata is build taking Utils.isTesting into account. Check this: https://github.com/delta-io/delta/blob/2871e550eabd9e4956aca1b02baeb20635a66931/core/src/main/scala/org/apache/spark/sql/delta/actions/actions.scala#L439
The isTesting is basically this: System.getenv("SPARK_TESTING") != null || System.getProperty(IS_TESTING.key) != null
With no extra environmental variables, the 3 tests are failing exactly as you described. If I execute export SPARK_TESTING=true and then I run the tests, they all pass.
@ganeshchand should we close this?
@bogdanmrusu do we know why running on GitHub Actions doesn't hit the issue?
Hey @zsxwing, I checked the scripts that are executed when we run the delta scripts and I couldn't see any env variables being set up. I am still ramping up and not sure where else to look. Would you advise trying to look somewhere else?
Just took a look. The testing flag is set in beforeAll: https://github.com/apache/spark/blob/v3.3.0/core/src/test/scala/org/apache/spark/SparkFunSuite.scala#L86 However, the Metadata object is created before beforeAll here: https://github.com/delta-io/delta/blob/e65496665fd1b187b488ab77f79986a8b381ee16/core/src/test/scala/org/apache/spark/sql/delta/ActionSerializerSuite.scala#L218
That's why it fails.
It doesn't fail on GitHub Actions because GitHub Actions run all of test suites. If there is any test suite running before ActionSerializerSuite, the testing flag will be set before running ActionSerializerSuite. That's why we didn't notice this issue.
The fix would be just making Metadata creation lazy so that it will be created after beforeAll. Is Anyone interested in trying to fix it?
@zsxwing I can try to fix it. Please assign it to me.
@bogdanmrusu Cool. Assigned it to you.
Running into this problem too. Using lazy val did not work for me.
@scottsand-db could you review all of Metadata creation and make sure they are created in test body rather than outside?