datafusion-comet icon indicating copy to clipboard operation
datafusion-comet copied to clipboard

Chore: refactor bit_count

Open kazantsev-maksim opened this issue 2 months ago • 7 comments

Which issue does this PR close?

Part of: https://github.com/apache/datafusion-comet/issues/2443

Rationale for this change

Part of: https://github.com/apache/datafusion-comet/issues/2443

What changes are included in this PR?

How are these changes tested?

Tested with existing unit tests

kazantsev-maksim avatar Oct 12 '25 12:10 kazantsev-maksim

The DataFusion implementation produces results that are not compatible with Spark

Project [bit_count(col2#2190) AS bit_count(col2)#2218] +- Relation spark_catalog.default.bitwise_count_test[col1#2189L,col2#2190,col3#2191,col4#2192] parquet

== Physical Plan == *(1) CometColumnarToRow +- CometProject [bit_count(col2)#2218], [bit_count(col2#2190) AS bit_count(col2)#2218] +- CometScan [native_iceberg_compat] parquet spark_catalog.default.bitwise_count_test[col2#2190] Batched: true, DataFilters: [], Format: CometParquet, Location: InMemoryFileIndex(1 paths)[file:/Users/tendoo/Desktop/datafusion-comet/spark/spark-warehouse/bitw..., PartitionFilters: [], PushedFilters: [], ReadSchema: structcol2:int

== Results ==

== Results == !== Correct Answer - 3 == == Spark Answer - 3 == struct<bit_count(col2):int> struct<bit_count(col2):int> ![31] [1] ![33] [31] [6] [6] (QueryTest.scala:244)

@andygrove @comphead

kazantsev-maksim avatar Oct 12 '25 12:10 kazantsev-maksim

Triggering CI

comphead avatar Oct 13 '25 16:10 comphead

there is still a conflict 🤔

comphead avatar Oct 20 '25 03:10 comphead

Thanks @comphead, conflicts resolved.

kazantsev-maksim avatar Oct 20 '25 16:10 kazantsev-maksim

Codecov Report

:white_check_mark: All modified and coverable lines are covered by tests. :white_check_mark: Project coverage is 45.97%. Comparing base (f09f8af) to head (f03f59d). :warning: Report is 624 commits behind head on main.

Additional details and impacted files
@@              Coverage Diff              @@
##               main    #2553       +/-   ##
=============================================
- Coverage     56.12%   45.97%   -10.16%     
- Complexity      976     1200      +224     
=============================================
  Files           119      146       +27     
  Lines         11743    13746     +2003     
  Branches       2251     2354      +103     
=============================================
- Hits           6591     6320      -271     
- Misses         4012     6385     +2373     
+ Partials       1140     1041       -99     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov-commenter avatar Oct 20 '25 17:10 codecov-commenter

*** 3 TESTS FAILED ***

comphead avatar Oct 20 '25 18:10 comphead

*** 3 TESTS FAILED ***

I think the DataFusion implementation needs modification to be fully compatible with Spark.

kazantsev-maksim avatar Oct 21 '25 17:10 kazantsev-maksim

After closing the task (https://github.com/apache/datafusion/pull/18322 ), the tests must pass. FYI @comphead

kazantsev-maksim avatar Dec 12 '25 16:12 kazantsev-maksim