spark-rapids [BUG] hash_aggregate_test.py::test_hash_multiple_mode_query_avg_distincts failed with DATAGEN

[BUG] hash_aggregate_test.py::test_hash_multiple_mode_query_avg_distincts failed with DATAGEN_SEED=1705756525

Open sameerz opened this issue 1 year ago • 8 comments

Describe the bug

[2024-01-20T14:37:05.949Z] FAILED ../../src/main/python/hash_aggregate_test.py::test_hash_multiple_mode_query_avg_distincts[{'spark.rapids.sql.variableFloatAgg.enabled': 'true', 'spark.rapids.sql.castStringToFloat.enabled': 'true'}-[('a', RepeatSeq(Float)), ('b', Float), ('c', Long)]][DATAGEN_SEED=1705756525, INJECT_OOM, IGNORE_ORDER, INCOMPAT, APPROXIMATE_FLOAT, ALLOW_NON_GPU(HashAggregateExec,AggregateExpression,UnscaledValue,MakeDecimal,AttributeReference,Alias,Sum,Count,Max,Min,Average,Cast,StddevPop,StddevSamp,VariancePop,VarianceSamp,NormalizeNaNAndZero,GreaterThan,Literal,If,EqualTo,First,SortAggregateExec,Coalesce,IsNull,EqualNullSafe,PivotFirst,GetArrayItem,ShuffleExchangeExec,HashPartitioning)] - AssertionError: GPU and CPU float values are different [0, 'avg(DISTINCT a)']
[2024-01-20T14:37:05.949Z] FAILED ../../src/main/python/hash_aggregate_test.py::test_hash_multiple_mode_query_avg_distincts[{'spark.rapids.sql.variableFloatAgg.enabled': 'true', 'spark.rapids.sql.castStringToFloat.enabled': 'true', 'spark.rapids.sql.hashAgg.replaceMode': 'final'}-[('a', RepeatSeq(Float)), ('b', Float), ('c', Long)]][DATAGEN_SEED=1705756525, IGNORE_ORDER, INCOMPAT, APPROXIMATE_FLOAT, ALLOW_NON_GPU(HashAggregateExec,AggregateExpression,UnscaledValue,MakeDecimal,AttributeReference,Alias,Sum,Count,Max,Min,Average,Cast,StddevPop,StddevSamp,VariancePop,VarianceSamp,NormalizeNaNAndZero,GreaterThan,Literal,If,EqualTo,First,SortAggregateExec,Coalesce,IsNull,EqualNullSafe,PivotFirst,GetArrayItem,ShuffleExchangeExec,HashPartitioning)] - AssertionError: GPU and CPU float values are different [0, 'avg(DISTINCT a)']
[2024-01-20T14:37:05.949Z] FAILED ../../src/main/python/hash_aggregate_test.py::test_hash_multiple_mode_query_avg_distincts[{'spark.rapids.sql.variableFloatAgg.enabled': 'true', 'spark.rapids.sql.castStringToFloat.enabled': 'true', 'spark.rapids.sql.hashAgg.replaceMode': 'partial'}-[('a', RepeatSeq(Float)), ('b', Float), ('c', Long)]][DATAGEN_SEED=1705756525, INJECT_OOM, IGNORE_ORDER, INCOMPAT, APPROXIMATE_FLOAT, ALLOW_NON_GPU(HashAggregateExec,AggregateExpression,UnscaledValue,MakeDecimal,AttributeReference,Alias,Sum,Count,Max,Min,Average,Cast,StddevPop,StddevSamp,VariancePop,VarianceSamp,NormalizeNaNAndZero,GreaterThan,Literal,If,EqualTo,First,SortAggregateExec,Coalesce,IsNull,EqualNullSafe,PivotFirst,GetArrayItem,ShuffleExchangeExec,HashPartitioning)] - AssertionError: GPU and CPU float values are different [0, 'avg(DISTINCT a)']

Detailed output

[2024-01-20T14:37:05.944Z] _ test_hash_multiple_mode_query_avg_distincts[{'spark.rapids.sql.variableFloatAgg.enabled': 'true', 'spark.rapids.sql.castStringToFloat.enabled': 'true'}-[('a', RepeatSeq(Float)), ('b', Float), ('c', Long)]] _
[2024-01-20T14:37:05.944Z] [gw3] linux -- Python 3.9.18 /opt/conda/bin/python
[2024-01-20T14:37:05.944Z] 
[2024-01-20T14:37:05.944Z] data_gen = [('a', RepeatSeq(Float)), ('b', Float), ('c', Long)]
[2024-01-20T14:37:05.944Z] conf = {'spark.rapids.sql.castStringToFloat.enabled': 'true', 'spark.rapids.sql.variableFloatAgg.enabled': 'true'}
[2024-01-20T14:37:05.944Z] 
[2024-01-20T14:37:05.944Z]     @approximate_float
[2024-01-20T14:37:05.944Z]     @ignore_order
[2024-01-20T14:37:05.944Z]     @incompat
[2024-01-20T14:37:05.944Z]     @pytest.mark.parametrize('data_gen', _init_list, ids=idfn)
[2024-01-20T14:37:05.944Z]     @pytest.mark.parametrize('conf', get_params(_confs, params_markers_for_confs),
[2024-01-20T14:37:05.944Z]         ids=idfn)
[2024-01-20T14:37:05.944Z]     def test_hash_multiple_mode_query_avg_distincts(data_gen, conf):
[2024-01-20T14:37:05.944Z] >       assert_gpu_and_cpu_are_equal_collect(
[2024-01-20T14:37:05.944Z]             lambda spark: gen_df(spark, data_gen, length=100)
[2024-01-20T14:37:05.944Z]                 .selectExpr('avg(distinct a)', 'avg(distinct b)','avg(distinct c)'),
[2024-01-20T14:37:05.944Z]             conf=conf)
[2024-01-20T14:37:05.944Z] 
[2024-01-20T14:37:05.944Z] ../../src/main/python/hash_aggregate_test.py:1087: 
[2024-01-20T14:37:05.944Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
[2024-01-20T14:37:05.944Z] ../../src/main/python/asserts.py:595: in assert_gpu_and_cpu_are_equal_collect
[2024-01-20T14:37:05.944Z]     _assert_gpu_and_cpu_are_equal(func, 'COLLECT', conf=conf, is_cpu_first=is_cpu_first, result_canonicalize_func_before_compare=result_canonicalize_func_before_compare)
[2024-01-20T14:37:05.944Z] ../../src/main/python/asserts.py:517: in _assert_gpu_and_cpu_are_equal
[2024-01-20T14:37:05.944Z]     assert_equal(from_cpu, from_gpu)
[2024-01-20T14:37:05.945Z] ../../src/main/python/asserts.py:107: in assert_equal
[2024-01-20T14:37:05.945Z]     _assert_equal(cpu, gpu, float_check=get_float_check(), path=[])
[2024-01-20T14:37:05.945Z] ../../src/main/python/asserts.py:43: in _assert_equal
[2024-01-20T14:37:05.945Z]     _assert_equal(cpu[index], gpu[index], float_check, path + [index])
[2024-01-20T14:37:05.945Z] ../../src/main/python/asserts.py:36: in _assert_equal
[2024-01-20T14:37:05.945Z]     _assert_equal(cpu[field], gpu[field], float_check, path + [field])
[2024-01-20T14:37:05.945Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
[2024-01-20T14:37:05.945Z] 
[2024-01-20T14:37:05.945Z] cpu = -9.961353300130207e+25, gpu = -9.961254917499822e+25
[2024-01-20T14:37:05.945Z] float_check = . at 0x7f7f804d4430>
[2024-01-20T14:37:05.945Z] path = [0, 'avg(DISTINCT a)']
[2024-01-20T14:37:05.945Z] 
[2024-01-20T14:37:05.945Z]     def _assert_equal(cpu, gpu, float_check, path):
[2024-01-20T14:37:05.945Z]         t = type(cpu)
[2024-01-20T14:37:05.945Z]         if (t is Row):
[2024-01-20T14:37:05.945Z]             assert len(cpu) == len(gpu), "CPU and GPU row have different lengths at {} CPU: {} GPU: {}".format(path, len(cpu), len(gpu))
[2024-01-20T14:37:05.945Z]             if hasattr(cpu, "__fields__") and hasattr(gpu, "__fields__"):
[2024-01-20T14:37:05.945Z]                 assert cpu.__fields__ == gpu.__fields__, "CPU and GPU row have different fields at {} CPU: {} GPU: {}".format(path, cpu.__fields__, gpu.__fields__)
[2024-01-20T14:37:05.945Z]                 for field in cpu.__fields__:
[2024-01-20T14:37:05.945Z]                     _assert_equal(cpu[field], gpu[field], float_check, path + [field])
[2024-01-20T14:37:05.945Z]             else:
[2024-01-20T14:37:05.945Z]                 for index in range(len(cpu)):
[2024-01-20T14:37:05.945Z]                     _assert_equal(cpu[index], gpu[index], float_check, path + [index])
[2024-01-20T14:37:05.945Z]         elif (t is list):
[2024-01-20T14:37:05.945Z]             assert len(cpu) == len(gpu), "CPU and GPU list have different lengths at {} CPU: {} GPU: {}".format(path, len(cpu), len(gpu))
[2024-01-20T14:37:05.945Z]             for index in range(len(cpu)):
[2024-01-20T14:37:05.945Z]                 _assert_equal(cpu[index], gpu[index], float_check, path + [index])
[2024-01-20T14:37:05.945Z]         elif (t is tuple):
[2024-01-20T14:37:05.945Z]             assert len(cpu) == len(gpu), "CPU and GPU list have different lengths at {} CPU: {} GPU: {}".format(path, len(cpu), len(gpu))
[2024-01-20T14:37:05.945Z]             for index in range(len(cpu)):
[2024-01-20T14:37:05.945Z]                 _assert_equal(cpu[index], gpu[index], float_check, path + [index])
[2024-01-20T14:37:05.945Z]         elif (t is pytypes.GeneratorType):
[2024-01-20T14:37:05.945Z]             index = 0
[2024-01-20T14:37:05.945Z]             # generator has no zip :( so we have to do this the hard way
[2024-01-20T14:37:05.945Z]             done = False
[2024-01-20T14:37:05.945Z]             while not done:
[2024-01-20T14:37:05.945Z]                 sub_cpu = None
[2024-01-20T14:37:05.945Z]                 sub_gpu = None
[2024-01-20T14:37:05.945Z]                 try:
[2024-01-20T14:37:05.945Z]                     sub_cpu = next(cpu)
[2024-01-20T14:37:05.945Z]                 except StopIteration:
[2024-01-20T14:37:05.945Z]                     done = True
[2024-01-20T14:37:05.945Z]     
[2024-01-20T14:37:05.945Z]                 try:
[2024-01-20T14:37:05.945Z]                     sub_gpu = next(gpu)
[2024-01-20T14:37:05.945Z]                 except StopIteration:
[2024-01-20T14:37:05.945Z]                     done = True
[2024-01-20T14:37:05.945Z]     
[2024-01-20T14:37:05.945Z]                 if done:
[2024-01-20T14:37:05.945Z]                     assert sub_cpu == sub_gpu and sub_cpu == None, "CPU and GPU generators have different lengths at {}".format(path)
[2024-01-20T14:37:05.945Z]                 else:
[2024-01-20T14:37:05.945Z]                     _assert_equal(sub_cpu, sub_gpu, float_check, path + [index])
[2024-01-20T14:37:05.945Z]     
[2024-01-20T14:37:05.945Z]                 index = index + 1
[2024-01-20T14:37:05.945Z]         elif (t is dict):
[2024-01-20T14:37:05.945Z]             # The order of key/values is not guaranteed in python dicts, nor are they guaranteed by Spark
[2024-01-20T14:37:05.945Z]             # so sort the items to do our best with ignoring the order of dicts
[2024-01-20T14:37:05.945Z]             cpu_items = list(cpu.items()).sort(key=_RowCmp)
[2024-01-20T14:37:05.945Z]             gpu_items = list(gpu.items()).sort(key=_RowCmp)
[2024-01-20T14:37:05.945Z]             _assert_equal(cpu_items, gpu_items, float_check, path + ["map"])
[2024-01-20T14:37:05.945Z]         elif (t is int):
[2024-01-20T14:37:05.945Z]             assert cpu == gpu, "GPU and CPU int values are different at {}".format(path)
[2024-01-20T14:37:05.945Z]         elif (t is float):
[2024-01-20T14:37:05.945Z]             if (math.isnan(cpu)):
[2024-01-20T14:37:05.945Z]                 assert math.isnan(gpu), "GPU and CPU float values are different at {}".format(path)
[2024-01-20T14:37:05.945Z]             else:
[2024-01-20T14:37:05.945Z] >               assert float_check(cpu, gpu), "GPU and CPU float values are different {}".format(path)
[2024-01-20T14:37:05.946Z] E               AssertionError: GPU and CPU float values are different [0, 'avg(DISTINCT a)']
[2024-01-20T14:37:05.946Z] 
[2024-01-20T14:37:05.946Z] ../../src/main/python/asserts.py:83: AssertionError
[2024-01-20T14:37:05.946Z] ----------------------------- Captured stdout call -----------------------------
[2024-01-20T14:37:05.946Z] ### CPU RUN ###
[2024-01-20T14:37:05.946Z] ### GPU RUN ###
[2024-01-20T14:37:05.946Z] ### COLLECT: GPU TOOK 0.17374825477600098 CPU TOOK 0.13654327392578125 ###
[2024-01-20T14:37:05.946Z] --- CPU OUTPUT
[2024-01-20T14:37:05.946Z] +++ GPU OUTPUT
[2024-01-20T14:37:05.946Z] @@ -1 +1 @@
[2024-01-20T14:37:05.946Z] -Row(avg(DISTINCT a)=-9.961353300130207e+25, avg(DISTINCT b)=nan, avg(DISTINCT c)=-6.749297777543448e+17)
[2024-01-20T14:37:05.946Z] +Row(avg(DISTINCT a)=-9.961254917499822e+25, avg(DISTINCT b)=nan, avg(DISTINCT c)=-6.74929777754345e+17)
[2024-01-20T14:37:05.946Z] _ test_hash_multiple_mode_query_avg_distincts[{'spark.rapids.sql.variableFloatAgg.enabled': 'true', 'spark.rapids.sql.castStringToFloat.enabled': 'true', 'spark.rapids.sql.hashAgg.replaceMode': 'final'}-[('a', RepeatSeq(Float)), ('b', Float), ('c', Long)]] _
[2024-01-20T14:37:05.946Z] [gw3] linux -- Python 3.9.18 /opt/conda/bin/python
[2024-01-20T14:37:05.946Z] 
[2024-01-20T14:37:05.946Z] data_gen = [('a', RepeatSeq(Float)), ('b', Float), ('c', Long)]
[2024-01-20T14:37:05.946Z] conf = {'spark.rapids.sql.castStringToFloat.enabled': 'true', 'spark.rapids.sql.hashAgg.replaceMode': 'final', 'spark.rapids.sql.variableFloatAgg.enabled': 'true'}
[2024-01-20T14:37:05.946Z] 
[2024-01-20T14:37:05.946Z]     @approximate_float
[2024-01-20T14:37:05.946Z]     @ignore_order
[2024-01-20T14:37:05.946Z]     @incompat
[2024-01-20T14:37:05.946Z]     @pytest.mark.parametrize('data_gen', _init_list, ids=idfn)
[2024-01-20T14:37:05.946Z]     @pytest.mark.parametrize('conf', get_params(_confs, params_markers_for_confs),
[2024-01-20T14:37:05.946Z]         ids=idfn)
[2024-01-20T14:37:05.946Z]     def test_hash_multiple_mode_query_avg_distincts(data_gen, conf):
[2024-01-20T14:37:05.946Z] >       assert_gpu_and_cpu_are_equal_collect(
[2024-01-20T14:37:05.946Z]             lambda spark: gen_df(spark, data_gen, length=100)
[2024-01-20T14:37:05.946Z]                 .selectExpr('avg(distinct a)', 'avg(distinct b)','avg(distinct c)'),
[2024-01-20T14:37:05.946Z]             conf=conf)
[2024-01-20T14:37:05.946Z] 
[2024-01-20T14:37:05.946Z] ../../src/main/python/hash_aggregate_test.py:1087: 
[2024-01-20T14:37:05.946Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
[2024-01-20T14:37:05.946Z] ../../src/main/python/asserts.py:595: in assert_gpu_and_cpu_are_equal_collect
[2024-01-20T14:37:05.946Z]     _assert_gpu_and_cpu_are_equal(func, 'COLLECT', conf=conf, is_cpu_first=is_cpu_first, result_canonicalize_func_before_compare=result_canonicalize_func_before_compare)
[2024-01-20T14:37:05.946Z] ../../src/main/python/asserts.py:517: in _assert_gpu_and_cpu_are_equal
[2024-01-20T14:37:05.946Z]     assert_equal(from_cpu, from_gpu)
[2024-01-20T14:37:05.946Z] ../../src/main/python/asserts.py:107: in assert_equal
[2024-01-20T14:37:05.946Z]     _assert_equal(cpu, gpu, float_check=get_float_check(), path=[])
[2024-01-20T14:37:05.946Z] ../../src/main/python/asserts.py:43: in _assert_equal
[2024-01-20T14:37:05.946Z]     _assert_equal(cpu[index], gpu[index], float_check, path + [index])
[2024-01-20T14:37:05.946Z] ../../src/main/python/asserts.py:36: in _assert_equal
[2024-01-20T14:37:05.946Z]     _assert_equal(cpu[field], gpu[field], float_check, path + [field])
[2024-01-20T14:37:05.946Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
[2024-01-20T14:37:05.946Z] 
[2024-01-20T14:37:05.946Z] cpu = -9.961254917487832e+25, gpu = -9.961353300130207e+25
[2024-01-20T14:37:05.946Z] float_check = . at 0x7f808f0e1d30>
[2024-01-20T14:37:05.946Z] path = [0, 'avg(DISTINCT a)']
[2024-01-20T14:37:05.946Z] 
[2024-01-20T14:37:05.946Z]     def _assert_equal(cpu, gpu, float_check, path):
[2024-01-20T14:37:05.946Z]         t = type(cpu)
[2024-01-20T14:37:05.946Z]         if (t is Row):
[2024-01-20T14:37:05.946Z]             assert len(cpu) == len(gpu), "CPU and GPU row have different lengths at {} CPU: {} GPU: {}".format(path, len(cpu), len(gpu))
[2024-01-20T14:37:05.946Z]             if hasattr(cpu, "__fields__") and hasattr(gpu, "__fields__"):
[2024-01-20T14:37:05.946Z]                 assert cpu.__fields__ == gpu.__fields__, "CPU and GPU row have different fields at {} CPU: {} GPU: {}".format(path, cpu.__fields__, gpu.__fields__)
[2024-01-20T14:37:05.946Z]                 for field in cpu.__fields__:
[2024-01-20T14:37:05.946Z]                     _assert_equal(cpu[field], gpu[field], float_check, path + [field])
[2024-01-20T14:37:05.946Z]             else:
[2024-01-20T14:37:05.946Z]                 for index in range(len(cpu)):
[2024-01-20T14:37:05.946Z]                     _assert_equal(cpu[index], gpu[index], float_check, path + [index])
[2024-01-20T14:37:05.946Z]         elif (t is list):
[2024-01-20T14:37:05.946Z]             assert len(cpu) == len(gpu), "CPU and GPU list have different lengths at {} CPU: {} GPU: {}".format(path, len(cpu), len(gpu))
[2024-01-20T14:37:05.946Z]             for index in range(len(cpu)):
[2024-01-20T14:37:05.946Z]                 _assert_equal(cpu[index], gpu[index], float_check, path + [index])
[2024-01-20T14:37:05.946Z]         elif (t is tuple):
[2024-01-20T14:37:05.946Z]             assert len(cpu) == len(gpu), "CPU and GPU list have different lengths at {} CPU: {} GPU: {}".format(path, len(cpu), len(gpu))
[2024-01-20T14:37:05.946Z]             for index in range(len(cpu)):
[2024-01-20T14:37:05.946Z]                 _assert_equal(cpu[index], gpu[index], float_check, path + [index])
[2024-01-20T14:37:05.946Z]         elif (t is pytypes.GeneratorType):
[2024-01-20T14:37:05.946Z]             index = 0
[2024-01-20T14:37:05.946Z]             # generator has no zip :( so we have to do this the hard way
[2024-01-20T14:37:05.946Z]             done = False
[2024-01-20T14:37:05.946Z]             while not done:
[2024-01-20T14:37:05.946Z]                 sub_cpu = None
[2024-01-20T14:37:05.947Z]                 sub_gpu = None
[2024-01-20T14:37:05.947Z]                 try:
[2024-01-20T14:37:05.947Z]                     sub_cpu = next(cpu)
[2024-01-20T14:37:05.947Z]                 except StopIteration:
[2024-01-20T14:37:05.947Z]                     done = True
[2024-01-20T14:37:05.947Z]     
[2024-01-20T14:37:05.947Z]                 try:
[2024-01-20T14:37:05.947Z]                     sub_gpu = next(gpu)
[2024-01-20T14:37:05.947Z]                 except StopIteration:
[2024-01-20T14:37:05.947Z]                     done = True
[2024-01-20T14:37:05.947Z]     
[2024-01-20T14:37:05.947Z]                 if done:
[2024-01-20T14:37:05.947Z]                     assert sub_cpu == sub_gpu and sub_cpu == None, "CPU and GPU generators have different lengths at {}".format(path)
[2024-01-20T14:37:05.947Z]                 else:
[2024-01-20T14:37:05.947Z]                     _assert_equal(sub_cpu, sub_gpu, float_check, path + [index])
[2024-01-20T14:37:05.947Z]     
[2024-01-20T14:37:05.947Z]                 index = index + 1
[2024-01-20T14:37:05.947Z]         elif (t is dict):
[2024-01-20T14:37:05.947Z]             # The order of key/values is not guaranteed in python dicts, nor are they guaranteed by Spark
[2024-01-20T14:37:05.947Z]             # so sort the items to do our best with ignoring the order of dicts
[2024-01-20T14:37:05.947Z]             cpu_items = list(cpu.items()).sort(key=_RowCmp)
[2024-01-20T14:37:05.947Z]             gpu_items = list(gpu.items()).sort(key=_RowCmp)
[2024-01-20T14:37:05.947Z]             _assert_equal(cpu_items, gpu_items, float_check, path + ["map"])
[2024-01-20T14:37:05.947Z]         elif (t is int):
[2024-01-20T14:37:05.947Z]             assert cpu == gpu, "GPU and CPU int values are different at {}".format(path)
[2024-01-20T14:37:05.947Z]         elif (t is float):
[2024-01-20T14:37:05.947Z]             if (math.isnan(cpu)):
[2024-01-20T14:37:05.947Z]                 assert math.isnan(gpu), "GPU and CPU float values are different at {}".format(path)
[2024-01-20T14:37:05.947Z]             else:
[2024-01-20T14:37:05.947Z] >               assert float_check(cpu, gpu), "GPU and CPU float values are different {}".format(path)
[2024-01-20T14:37:05.947Z] E               AssertionError: GPU and CPU float values are different [0, 'avg(DISTINCT a)']
[2024-01-20T14:37:05.947Z] 
[2024-01-20T14:37:05.947Z] ../../src/main/python/asserts.py:83: AssertionError
[2024-01-20T14:37:05.947Z] ----------------------------- Captured stdout call -----------------------------
[2024-01-20T14:37:05.947Z] ### CPU RUN ###
[2024-01-20T14:37:05.947Z] ### GPU RUN ###
[2024-01-20T14:37:05.947Z] ### COLLECT: GPU TOOK 0.14001178741455078 CPU TOOK 0.11022210121154785 ###
[2024-01-20T14:37:05.947Z] --- CPU OUTPUT
[2024-01-20T14:37:05.947Z] +++ GPU OUTPUT
[2024-01-20T14:37:05.947Z] @@ -1 +1 @@
[2024-01-20T14:37:05.947Z] -Row(avg(DISTINCT a)=-9.961254917487832e+25, avg(DISTINCT b)=nan, avg(DISTINCT c)=-6.749297777543451e+17)
[2024-01-20T14:37:05.947Z] +Row(avg(DISTINCT a)=-9.961353300130207e+25, avg(DISTINCT b)=nan, avg(DISTINCT c)=-6.749297777543451e+17)
[2024-01-20T14:37:05.947Z] _ test_hash_multiple_mode_query_avg_distincts[{'spark.rapids.sql.variableFloatAgg.enabled': 'true', 'spark.rapids.sql.castStringToFloat.enabled': 'true', 'spark.rapids.sql.hashAgg.replaceMode': 'partial'}-[('a', RepeatSeq(Float)), ('b', Float), ('c', Long)]] _
[2024-01-20T14:37:05.947Z] [gw3] linux -- Python 3.9.18 /opt/conda/bin/python
[2024-01-20T14:37:05.947Z] 
[2024-01-20T14:37:05.947Z] data_gen = [('a', RepeatSeq(Float)), ('b', Float), ('c', Long)]
[2024-01-20T14:37:05.947Z] conf = {'spark.rapids.sql.castStringToFloat.enabled': 'true', 'spark.rapids.sql.hashAgg.replaceMode': 'partial', 'spark.rapids.sql.variableFloatAgg.enabled': 'true'}
[2024-01-20T14:37:05.947Z] 
[2024-01-20T14:37:05.947Z]     @approximate_float
[2024-01-20T14:37:05.947Z]     @ignore_order
[2024-01-20T14:37:05.947Z]     @incompat
[2024-01-20T14:37:05.947Z]     @pytest.mark.parametrize('data_gen', _init_list, ids=idfn)
[2024-01-20T14:37:05.947Z]     @pytest.mark.parametrize('conf', get_params(_confs, params_markers_for_confs),
[2024-01-20T14:37:05.947Z]         ids=idfn)
[2024-01-20T14:37:05.947Z]     def test_hash_multiple_mode_query_avg_distincts(data_gen, conf):
[2024-01-20T14:37:05.947Z] >       assert_gpu_and_cpu_are_equal_collect(
[2024-01-20T14:37:05.947Z]             lambda spark: gen_df(spark, data_gen, length=100)
[2024-01-20T14:37:05.947Z]                 .selectExpr('avg(distinct a)', 'avg(distinct b)','avg(distinct c)'),
[2024-01-20T14:37:05.947Z]             conf=conf)
[2024-01-20T14:37:05.947Z] 
[2024-01-20T14:37:05.947Z] ../../src/main/python/hash_aggregate_test.py:1087: 
[2024-01-20T14:37:05.947Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
[2024-01-20T14:37:05.947Z] ../../src/main/python/asserts.py:595: in assert_gpu_and_cpu_are_equal_collect
[2024-01-20T14:37:05.947Z]     _assert_gpu_and_cpu_are_equal(func, 'COLLECT', conf=conf, is_cpu_first=is_cpu_first, result_canonicalize_func_before_compare=result_canonicalize_func_before_compare)
[2024-01-20T14:37:05.947Z] ../../src/main/python/asserts.py:517: in _assert_gpu_and_cpu_are_equal
[2024-01-20T14:37:05.947Z]     assert_equal(from_cpu, from_gpu)
[2024-01-20T14:37:05.947Z] ../../src/main/python/asserts.py:107: in assert_equal
[2024-01-20T14:37:05.947Z]     _assert_equal(cpu, gpu, float_check=get_float_check(), path=[])
[2024-01-20T14:37:05.947Z] ../../src/main/python/asserts.py:43: in _assert_equal
[2024-01-20T14:37:05.947Z]     _assert_equal(cpu[index], gpu[index], float_check, path + [index])
[2024-01-20T14:37:05.947Z] ../../src/main/python/asserts.py:36: in _assert_equal
[2024-01-20T14:37:05.947Z]     _assert_equal(cpu[field], gpu[field], float_check, path + [field])
[2024-01-20T14:37:05.947Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
[2024-01-20T14:37:05.947Z] 
[2024-01-20T14:37:05.947Z] cpu = -9.961254917487832e+25, gpu = -9.961353300130207e+25
[2024-01-20T14:37:05.947Z] float_check = . at 0x7f7f7bd170d0>
[2024-01-20T14:37:05.947Z] path = [0, 'avg(DISTINCT a)']
[2024-01-20T14:37:05.947Z] 
[2024-01-20T14:37:05.947Z]     def _assert_equal(cpu, gpu, float_check, path):
[2024-01-20T14:37:05.947Z]         t = type(cpu)
[2024-01-20T14:37:05.947Z]         if (t is Row):
[2024-01-20T14:37:05.947Z]             assert len(cpu) == len(gpu), "CPU and GPU row have different lengths at {} CPU: {} GPU: {}".format(path, len(cpu), len(gpu))
[2024-01-20T14:37:05.948Z]             if hasattr(cpu, "__fields__") and hasattr(gpu, "__fields__"):
[2024-01-20T14:37:05.948Z]                 assert cpu.__fields__ == gpu.__fields__, "CPU and GPU row have different fields at {} CPU: {} GPU: {}".format(path, cpu.__fields__, gpu.__fields__)
[2024-01-20T14:37:05.948Z]                 for field in cpu.__fields__:
[2024-01-20T14:37:05.948Z]                     _assert_equal(cpu[field], gpu[field], float_check, path + [field])
[2024-01-20T14:37:05.948Z]             else:
[2024-01-20T14:37:05.948Z]                 for index in range(len(cpu)):
[2024-01-20T14:37:05.948Z]                     _assert_equal(cpu[index], gpu[index], float_check, path + [index])
[2024-01-20T14:37:05.948Z]         elif (t is list):
[2024-01-20T14:37:05.948Z]             assert len(cpu) == len(gpu), "CPU and GPU list have different lengths at {} CPU: {} GPU: {}".format(path, len(cpu), len(gpu))
[2024-01-20T14:37:05.948Z]             for index in range(len(cpu)):
[2024-01-20T14:37:05.948Z]                 _assert_equal(cpu[index], gpu[index], float_check, path + [index])
[2024-01-20T14:37:05.948Z]         elif (t is tuple):
[2024-01-20T14:37:05.948Z]             assert len(cpu) == len(gpu), "CPU and GPU list have different lengths at {} CPU: {} GPU: {}".format(path, len(cpu), len(gpu))
[2024-01-20T14:37:05.948Z]             for index in range(len(cpu)):
[2024-01-20T14:37:05.948Z]                 _assert_equal(cpu[index], gpu[index], float_check, path + [index])
[2024-01-20T14:37:05.948Z]         elif (t is pytypes.GeneratorType):
[2024-01-20T14:37:05.948Z]             index = 0
[2024-01-20T14:37:05.948Z]             # generator has no zip :( so we have to do this the hard way
[2024-01-20T14:37:05.948Z]             done = False
[2024-01-20T14:37:05.948Z]             while not done:
[2024-01-20T14:37:05.948Z]                 sub_cpu = None
[2024-01-20T14:37:05.948Z]                 sub_gpu = None
[2024-01-20T14:37:05.948Z]                 try:
[2024-01-20T14:37:05.948Z]                     sub_cpu = next(cpu)
[2024-01-20T14:37:05.948Z]                 except StopIteration:
[2024-01-20T14:37:05.948Z]                     done = True
[2024-01-20T14:37:05.948Z]     
[2024-01-20T14:37:05.948Z]                 try:
[2024-01-20T14:37:05.948Z]                     sub_gpu = next(gpu)
[2024-01-20T14:37:05.948Z]                 except StopIteration:
[2024-01-20T14:37:05.948Z]                     done = True
[2024-01-20T14:37:05.948Z]     
[2024-01-20T14:37:05.948Z]                 if done:
[2024-01-20T14:37:05.948Z]                     assert sub_cpu == sub_gpu and sub_cpu == None, "CPU and GPU generators have different lengths at {}".format(path)
[2024-01-20T14:37:05.948Z]                 else:
[2024-01-20T14:37:05.948Z]                     _assert_equal(sub_cpu, sub_gpu, float_check, path + [index])
[2024-01-20T14:37:05.948Z]     
[2024-01-20T14:37:05.948Z]                 index = index + 1
[2024-01-20T14:37:05.948Z]         elif (t is dict):
[2024-01-20T14:37:05.948Z]             # The order of key/values is not guaranteed in python dicts, nor are they guaranteed by Spark
[2024-01-20T14:37:05.948Z]             # so sort the items to do our best with ignoring the order of dicts
[2024-01-20T14:37:05.948Z]             cpu_items = list(cpu.items()).sort(key=_RowCmp)
[2024-01-20T14:37:05.948Z]             gpu_items = list(gpu.items()).sort(key=_RowCmp)
[2024-01-20T14:37:05.948Z]             _assert_equal(cpu_items, gpu_items, float_check, path + ["map"])
[2024-01-20T14:37:05.948Z]         elif (t is int):
[2024-01-20T14:37:05.948Z]             assert cpu == gpu, "GPU and CPU int values are different at {}".format(path)
[2024-01-20T14:37:05.948Z]         elif (t is float):
[2024-01-20T14:37:05.948Z]             if (math.isnan(cpu)):
[2024-01-20T14:37:05.948Z]                 assert math.isnan(gpu), "GPU and CPU float values are different at {}".format(path)
[2024-01-20T14:37:05.948Z]             else:
[2024-01-20T14:37:05.948Z] >               assert float_check(cpu, gpu), "GPU and CPU float values are different {}".format(path)
[2024-01-20T14:37:05.948Z] E               AssertionError: GPU and CPU float values are different [0, 'avg(DISTINCT a)']
[2024-01-20T14:37:05.948Z] 
[2024-01-20T14:37:05.948Z] ../../src/main/python/asserts.py:83: AssertionError
[2024-01-20T14:37:05.948Z] ----------------------------- Captured stdout call -----------------------------
[2024-01-20T14:37:05.948Z] ### CPU RUN ###
[2024-01-20T14:37:05.948Z] ### GPU RUN ###
[2024-01-20T14:37:05.948Z] ### COLLECT: GPU TOOK 0.1446061134338379 CPU TOOK 0.08382821083068848 ###
[2024-01-20T14:37:05.948Z] --- CPU OUTPUT
[2024-01-20T14:37:05.948Z] +++ GPU OUTPUT
[2024-01-20T14:37:05.948Z] @@ -1 +1 @@
[2024-01-20T14:37:05.948Z] -Row(avg(DISTINCT a)=-9.961254917487832e+25, avg(DISTINCT b)=nan, avg(DISTINCT c)=-6.749297777543451e+17)
[2024-01-20T14:37:05.948Z] +Row(avg(DISTINCT a)=-9.961353300130207e+25, avg(DISTINCT b)=nan, avg(DISTINCT c)=-6.749297777543448e+17)

Steps/Code to reproduce bug

Expected behavior

Environment details (please complete the following information)

Environment location: Regular integration test environment
Spark configuration settings related to the issue

Additional context Scala 2.13 test DATAGEN_SEED=1705756525

Jan 21 '24 22:01 sameerz

spark-rapids spark-rapids copied to clipboard

[BUG] hash_aggregate_test.py::test_hash_multiple_mode_query_avg_distincts failed with DATAGEN_SEED=1705756525

spark-rapids
spark-rapids copied to clipboard