spark-fast-tests icon indicating copy to clipboard operation
spark-fast-tests copied to clipboard

assertLargeDatasetEquality enhancements

Open MrPowers opened this issue 6 years ago • 3 comments

Use DatasetCountMismatch for count differences.

Use basicMismatchMessage for DataFrames that aren't equal.

MrPowers avatar Aug 22 '18 13:08 MrPowers

Here an example of what can happen

class TestStats extends FunSuite with DataFrameComparer{
  test("compare same dataframe but with ordered differently") {
    val someDF = Seq(
      (8, "bat"),
      (64, "mouse"),
      (-27, "horse")
    ).toDF("number", "word")
    assertLargeDataFrameEquality(someDF.sort('number), someDF)
  }
}
(new TestStats).execute()

[32mTestStats:[0m [31m- compare same dataframe *** FAILED ***[0m [31m com.github.mrpowers.spark.fast.tests.DatasetContentMismatch: Actual DataFrame Row Count: '3'[0m [31mExpected DataFrame Row Count: '3'[0m

cdemonchy-pro avatar Aug 14 '20 14:08 cdemonchy-pro

Screen Shot 2020-08-15 at 6 00 24 PM

@cdemonchy-pro - see above for the error message I'm currently getting when I recreate your test. Can you please send me the spark-fast-tests version you're using?

MrPowers avatar Aug 15 '20 23:08 MrPowers

Well I am currently using version 2.3.1_0.15.0 but I am working on a Databricks cluster (Apache Spark 2.4.5, Scala 2.11) that must be a specific quirk of it, sorry for the noise

cdemonchy-pro avatar Aug 16 '20 09:08 cdemonchy-pro