spark-fast-tests
spark-fast-tests copied to clipboard
assertLargeDatasetEquality enhancements
Use DatasetCountMismatch
for count differences.
Use basicMismatchMessage
for DataFrames that aren't equal.
Here an example of what can happen
class TestStats extends FunSuite with DataFrameComparer{
test("compare same dataframe but with ordered differently") {
val someDF = Seq(
(8, "bat"),
(64, "mouse"),
(-27, "horse")
).toDF("number", "word")
assertLargeDataFrameEquality(someDF.sort('number), someDF)
}
}
(new TestStats).execute()
[32mTestStats:[0m [31m- compare same dataframe *** FAILED ***[0m [31m com.github.mrpowers.spark.fast.tests.DatasetContentMismatch: Actual DataFrame Row Count: '3'[0m [31mExpected DataFrame Row Count: '3'[0m
@cdemonchy-pro - see above for the error message I'm currently getting when I recreate your test. Can you please send me the spark-fast-tests version you're using?
Well I am currently using version 2.3.1_0.15.0
but I am working on a Databricks cluster (Apache Spark 2.4.5, Scala 2.11) that must be a specific quirk of it, sorry for the noise