spark-rapids [BUG] hash_aggregate_test.py::test_exact_percentile_reduction failed with DATAGEN

Describe the bug

[2024-01-21T17:44:19.558Z] FAILED ../../src/main/python/hash_aggregate_test.py::test_exact_percentile_reduction[[('val', RepeatSeq(Double)), ('freq', Long(not_null))]][DATAGEN_SEED=1705857175][0m - AssertionError: GPU and CPU float values are different [0, 'percentile(val,...

Summary:

09:44:19  --- CPU OUTPUT
09:44:19  +++ GPU OUTPUT
09:44:19  @@ -1 +1 @@
09:44:19  -Row(percentile(val, CAST(0.1 AS DOUBLE), 1)=-3.0600528894266366e+181, percentile(val, CAST(0 AS DOUBLE), 1)=-2.4711378026196358e+293, percentile(val, CAST(1 AS DOUBLE), 1)=nan, percentile(val, array(0.1), 1)=[-3.0600528894266366e+181], percentile(val, array(), 1)=None, percentile(val, array(0.1, 0.5, 0.9), 1)=[-3.0600528894266366e+181, -4.9069119243789216e-275, 1.7532295949136916e+204], percentile(val, array(CAST(0 AS DECIMAL(14,4)), CAST(0.0001 AS DECIMAL(14,4)), CAST(0.5 AS DECIMAL(14,4)), CAST(0.9999 AS DECIMAL(14,4)), CAST(1 AS DECIMAL(14,4))), 1)=[-2.4711378026196358e+293, -2.4711378026196358e+293, -4.9069119243789216e-275, nan, nan], percentile(val, CAST(0.1 AS DOUBLE), abs(freq))=-1.3398677426484608e+183, percentile(val, CAST(0 AS DOUBLE), abs(freq))=-2.4711378026196358e+293, percentile(val, CAST(1 AS DOUBLE), abs(freq))=nan, percentile(val, array(0.1), abs(freq))=[-1.3398677426484608e+183], percentile(val, array(), abs(freq))=None, percentile(val, array(0.1, 0.5, 0.9), abs(freq))=[-1.3398677426484608e+183, -4.302064318624199e-276, 5.054511151289938e+220], percentile(val, array(CAST(0 AS DECIMAL(14,4)), CAST(0.0001 AS DECIMAL(14,4)), CAST(0.5 AS DECIMAL(14,4)), CAST(0.9999 AS DECIMAL(14,4)), CAST(1 AS DECIMAL(14,4))), abs(freq))=[-2.4711378026196358e+293, -2.4711378026196358e+293, -4.302064318624199e-276, nan, nan])
09:44:19  +Row(percentile(val, CAST(0.1 AS DOUBLE), 1)=-3.0600528894266366e+181, percentile(val, CAST(0 AS DOUBLE), 1)=-2.4711378026196358e+293, percentile(val, CAST(1 AS DOUBLE), 1)=nan, percentile(val, array(0.1), 1)=[-3.0600528894266366e+181], percentile(val, array(), 1)=None, percentile(val, array(0.1, 0.5, 0.9), 1)=[-3.0600528894266366e+181, -4.302064318624199e-276, 1.7532295949136916e+204], percentile(val, array(CAST(0 AS DECIMAL(14,4)), CAST(0.0001 AS DECIMAL(14,4)), CAST(0.5 AS DECIMAL(14,4)), CAST(0.9999 AS DECIMAL(14,4)), CAST(1 AS DECIMAL(14,4))), 1)=[-2.4711378026196358e+293, -2.4711378026196358e+293, -4.302064318624199e-276, nan, nan], percentile(val, CAST(0.1 AS DOUBLE), abs(freq))=-1.3398677426484608e+183, percentile(val, CAST(0 AS DOUBLE), abs(freq))=-2.4711378026196358e+293, percentile(val, CAST(1 AS DOUBLE), abs(freq))=nan, percentile(val, array(0.1), abs(freq))=[-1.3398677426484608e+183], percentile(val, array(), abs(freq))=None, percentile(val, array(0.1, 0.5, 0.9), abs(freq))=[-1.3398677426484608e+183, -4.302064318624199e-276, 5.054511151289938e+220], percentile(val, array(CAST(0 AS DECIMAL(14,4)), CAST(0.0001 AS DECIMAL(14,4)), CAST(0.5 AS DECIMAL(14,4)), CAST(0.9999 AS DECIMAL(14,4)), CAST(1 AS DECIMAL(14,4))), abs(freq))=[-2.4711378026196358e+293, -2.4711378026196358e+293, -4.302064318624199e-276, nan, nan])

Detailed output

 _ test_exact_percentile_reduction[[('val', RepeatSeq(Double)), ('freq', Long(not_null))]] _
09:44:19  
09:44:19  data_gen = [('val', RepeatSeq(Double)), ('freq', Long(not_null))]
09:44:19  
09:44:19      @pytest.mark.parametrize('data_gen', exact_percentile_reduction_data_gen, ids=idfn)
09:44:19      def test_exact_percentile_reduction(data_gen):
09:44:19  >       assert_gpu_and_cpu_are_equal_collect(
09:44:19              lambda spark: exact_percentile_reduction(gen_df(spark, data_gen))
09:44:19          )
09:44:19  
09:44:19  ../../src/main/python/hash_aggregate_test.py:922: 
09:44:19  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
09:44:19  ../../src/main/python/asserts.py:595: in assert_gpu_and_cpu_are_equal_collect
09:44:19      _assert_gpu_and_cpu_are_equal(func, 'COLLECT', conf=conf, is_cpu_first=is_cpu_first, result_canonicalize_func_before_compare=result_canonicalize_func_before_compare)
09:44:19  ../../src/main/python/asserts.py:517: in _assert_gpu_and_cpu_are_equal
09:44:19      assert_equal(from_cpu, from_gpu)
09:44:19  ../../src/main/python/asserts.py:107: in assert_equal
09:44:19      _assert_equal(cpu, gpu, float_check=get_float_check(), path=[])
09:44:19  ../../src/main/python/asserts.py:43: in _assert_equal
09:44:19      _assert_equal(cpu[index], gpu[index], float_check, path + [index])
09:44:19  ../../src/main/python/asserts.py:36: in _assert_equal
09:44:19      _assert_equal(cpu[field], gpu[field], float_check, path + [field])
09:44:19  ../../src/main/python/asserts.py:43: in _assert_equal
09:44:19      _assert_equal(cpu[index], gpu[index], float_check, path + [index])
09:44:19  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
09:44:19  
09:44:19  cpu = -4.9069119243789216e-275, gpu = -4.302064318624199e-276
09:44:19  float_check = . at 0x7f1a9a9a4b80>
09:44:19  path = [0, 'percentile(val, array(0.1, 0.5, 0.9), 1)', 1]
09:44:19  
09:44:19      def _assert_equal(cpu, gpu, float_check, path):
09:44:19          t = type(cpu)
09:44:19          if (t is Row):
09:44:19              assert len(cpu) == len(gpu), "CPU and GPU row have different lengths at {} CPU: {} GPU: {}".format(path, len(cpu), len(gpu))
09:44:19              if hasattr(cpu, "__fields__") and hasattr(gpu, "__fields__"):
09:44:19                  assert cpu.__fields__ == gpu.__fields__, "CPU and GPU row have different fields at {} CPU: {} GPU: {}".format(path, cpu.__fields__, gpu.__fields__)
09:44:19                  for field in cpu.__fields__:
09:44:19                      _assert_equal(cpu[field], gpu[field], float_check, path + [field])
09:44:19              else:
09:44:19                  for index in range(len(cpu)):
09:44:19                      _assert_equal(cpu[index], gpu[index], float_check, path + [index])
09:44:19          elif (t is list):
09:44:19              assert len(cpu) == len(gpu), "CPU and GPU list have different lengths at {} CPU: {} GPU: {}".format(path, len(cpu), len(gpu))
09:44:19              for index in range(len(cpu)):
09:44:19                  _assert_equal(cpu[index], gpu[index], float_check, path + [index])
09:44:19          elif (t is tuple):
09:44:19              assert len(cpu) == len(gpu), "CPU and GPU list have different lengths at {} CPU: {} GPU: {}".format(path, len(cpu), len(gpu))
09:44:19              for index in range(len(cpu)):
09:44:19                  _assert_equal(cpu[index], gpu[index], float_check, path + [index])
09:44:19          elif (t is pytypes.GeneratorType):
09:44:19              index = 0
09:44:19              # generator has no zip :( so we have to do this the hard way
09:44:19              done = False
09:44:19              while not done:
09:44:19                  sub_cpu = None
09:44:19                  sub_gpu = None
09:44:19                  try:
09:44:19                      sub_cpu = next(cpu)
09:44:19                  except StopIteration:
09:44:19                      done = True
09:44:19      
09:44:19                  try:
09:44:19                      sub_gpu = next(gpu)
09:44:19                  except StopIteration:
09:44:19                      done = True
09:44:19      
09:44:19                  if done:
09:44:19                      assert sub_cpu == sub_gpu and sub_cpu == None, "CPU and GPU generators have different lengths at {}".format(path)
09:44:19                  else:
09:44:19                      _assert_equal(sub_cpu, sub_gpu, float_check, path + [index])
09:44:19      
09:44:19                  index = index + 1
09:44:19          elif (t is dict):
09:44:19              # The order of key/values is not guaranteed in python dicts, nor are they guaranteed by Spark
09:44:19              # so sort the items to do our best with ignoring the order of dicts
09:44:19              cpu_items = list(cpu.items()).sort(key=_RowCmp)
09:44:19              gpu_items = list(gpu.items()).sort(key=_RowCmp)
09:44:19              _assert_equal(cpu_items, gpu_items, float_check, path + ["map"])
09:44:19          elif (t is int):
09:44:19              assert cpu == gpu, "GPU and CPU int values are different at {}".format(path)
09:44:19          elif (t is float):
09:44:19              if (math.isnan(cpu)):
09:44:19                  assert math.isnan(gpu), "GPU and CPU float values are different at {}".format(path)
09:44:19              else:
09:44:19  >               assert float_check(cpu, gpu), "GPU and CPU float values are different {}".format(path)
09:44:19  E               AssertionError: GPU and CPU float values are different [0, 'percentile(val, array(0.1, 0.5, 0.9), 1)', 1]
09:44:19  
09:44:19  ../../src/main/python/asserts.py:83: AssertionError
09:44:19  ----------------------------- Captured stdout call -----------------------------
09:44:19  ### CPU RUN ###
09:44:19  ### GPU RUN ###
09:44:19  ### COLLECT: GPU TOOK 0.26613664627075195 CPU TOOK 0.17803049087524414 ###
09:44:19  --- CPU OUTPUT
09:44:19  +++ GPU OUTPUT
09:44:19  @@ -1 +1 @@
09:44:19  -Row(percentile(val, CAST(0.1 AS DOUBLE), 1)=-3.0600528894266366e+181, percentile(val, CAST(0 AS DOUBLE), 1)=-2.4711378026196358e+293, percentile(val, CAST(1 AS DOUBLE), 1)=nan, percentile(val, array(0.1), 1)=[-3.0600528894266366e+181], percentile(val, array(), 1)=None, percentile(val, array(0.1, 0.5, 0.9), 1)=[-3.0600528894266366e+181, -4.9069119243789216e-275, 1.7532295949136916e+204], percentile(val, array(CAST(0 AS DECIMAL(14,4)), CAST(0.0001 AS DECIMAL(14,4)), CAST(0.5 AS DECIMAL(14,4)), CAST(0.9999 AS DECIMAL(14,4)), CAST(1 AS DECIMAL(14,4))), 1)=[-2.4711378026196358e+293, -2.4711378026196358e+293, -4.9069119243789216e-275, nan, nan], percentile(val, CAST(0.1 AS DOUBLE), abs(freq))=-1.3398677426484608e+183, percentile(val, CAST(0 AS DOUBLE), abs(freq))=-2.4711378026196358e+293, percentile(val, CAST(1 AS DOUBLE), abs(freq))=nan, percentile(val, array(0.1), abs(freq))=[-1.3398677426484608e+183], percentile(val, array(), abs(freq))=None, percentile(val, array(0.1, 0.5, 0.9), abs(freq))=[-1.3398677426484608e+183, -4.302064318624199e-276, 5.054511151289938e+220], percentile(val, array(CAST(0 AS DECIMAL(14,4)), CAST(0.0001 AS DECIMAL(14,4)), CAST(0.5 AS DECIMAL(14,4)), CAST(0.9999 AS DECIMAL(14,4)), CAST(1 AS DECIMAL(14,4))), abs(freq))=[-2.4711378026196358e+293, -2.4711378026196358e+293, -4.302064318624199e-276, nan, nan])
09:44:19  +Row(percentile(val, CAST(0.1 AS DOUBLE), 1)=-3.0600528894266366e+181, percentile(val, CAST(0 AS DOUBLE), 1)=-2.4711378026196358e+293, percentile(val, CAST(1 AS DOUBLE), 1)=nan, percentile(val, array(0.1), 1)=[-3.0600528894266366e+181], percentile(val, array(), 1)=None, percentile(val, array(0.1, 0.5, 0.9), 1)=[-3.0600528894266366e+181, -4.302064318624199e-276, 1.7532295949136916e+204], percentile(val, array(CAST(0 AS DECIMAL(14,4)), CAST(0.0001 AS DECIMAL(14,4)), CAST(0.5 AS DECIMAL(14,4)), CAST(0.9999 AS DECIMAL(14,4)), CAST(1 AS DECIMAL(14,4))), 1)=[-2.4711378026196358e+293, -2.4711378026196358e+293, -4.302064318624199e-276, nan, nan], percentile(val, CAST(0.1 AS DOUBLE), abs(freq))=-1.3398677426484608e+183, percentile(val, CAST(0 AS DOUBLE), abs(freq))=-2.4711378026196358e+293, percentile(val, CAST(1 AS DOUBLE), abs(freq))=nan, percentile(val, array(0.1), abs(freq))=[-1.3398677426484608e+183], percentile(val, array(), abs(freq))=None, percentile(val, array(0.1, 0.5, 0.9), abs(freq))=[-1.3398677426484608e+183, -4.302064318624199e-276, 5.054511151289938e+220], percentile(val, array(CAST(0 AS DECIMAL(14,4)), CAST(0.0001 AS DECIMAL(14,4)), CAST(0.5 AS DECIMAL(14,4)), CAST(0.9999 AS DECIMAL(14,4)), CAST(1 AS DECIMAL(14,4))), abs(freq))=[-2.4711378026196358e+293, -2.4711378026196358e+293, -4.302064318624199e-276, nan, nan])

Steps/Code to reproduce bug

Expected behavior

Environment details (please complete the following information)

Environment location: Dataproc 2.0 Ubuntu 18.04

Additional context

Jan 21 '24 22:01 sameerz

This is 100% repeatable, and it calculates different results for 0.5 (median value) every time. I think this is a bug in Spark that I found a while ago.

https://issues.apache.org/jira/browse/SPARK-45599

Not sure if we want to avoid -0.0 in our test cases until this is fixed or what. (This one had 42 out of 2048 that were -0.0 and 42 that were 0.0, which is what is needed to make the error happen with Spark)

Jan 22 '24 17:01 revans2

I think the solution here is to update FloatGen and DoubleGen so that we can replace -0.0 with 0.0. We would enable it for these tests, but keep other tests still using -0.0s. We also should have a follow on issue so when SPARK-45599 is fixed that we can come back and turn on -0.0 testing for versions of Spark that get the right answer.

Jan 23 '24 21:01 revans2

The underlying issue SPARK-45599 has been resolved, so we should follow up to turn on -0.0 testing for Spark 4.0.0+, 3.5.2+

Apr 24 '24 17:04 sameerz

spark-rapids
spark-rapids copied to clipboard

[BUG] hash_aggregate_test.py::test_exact_percentile_reduction failed with DATAGEN_SEED=1705866905

spark-rapids spark-rapids copied to clipboard

[BUG] hash_aggregate_test.py::test_exact_percentile_reduction failed with DATAGEN_SEED=1705866905

spark-rapids
spark-rapids copied to clipboard