datafusion-comet icon indicating copy to clipboard operation
datafusion-comet copied to clipboard

chore: Add memory reservation debug logging and visualization

Open andygrove opened this issue 2 months ago • 5 comments

Which issue does this PR close?

Closes #.

Rationale for this change

Debugging.

[Task 486] MemoryPool[ExternalSorter[6]].try_grow(256232960) returning Ok
[Task 486] MemoryPool[ExternalSorter[6]].try_grow(256375168) returning Ok
[Task 486] MemoryPool[ExternalSorter[6]].try_grow(256899456) returning Ok
[Task 486] MemoryPool[ExternalSorter[6]].try_grow(257296128) returning Ok
[Task 486] MemoryPool[ExternalSorter[6]].try_grow(257820416) returning Err
[Task 486] MemoryPool[ExternalSorterMerge[6]].shrink(10485760)
[Task 486] MemoryPool[ExternalSorter[6]].shrink(150464)
[Task 486] MemoryPool[ExternalSorter[6]].shrink(146688)
[Task 486] MemoryPool[ExternalSorter[6]].shrink(137856)
[Task 486] MemoryPool[ExternalSorter[6]].shrink(141952)
[Task 486] MemoryPool[ExternalSorterMerge[6]].try_grow(0) returning Ok
[Task 486] MemoryPool[ExternalSorterMerge[6]].try_grow(0) returning Ok
[Task 486] MemoryPool[ExternalSorter[6]].shrink(524288)
[Task 486] MemoryPool[ExternalSorterMerge[6]].try_grow(0) returning Ok
[Task 486] MemoryPool[ExternalSorterMerge[6]].try_grow(68928) returning Ok

From this, we can make pretty charts to help with comprehension:

image

What changes are included in this PR?

  • Add new config spark.comet.debug.memory
  • Add new LoggingPool that is enabled when the new config is set

How are these changes tested?

andygrove avatar Oct 03 '25 15:10 andygrove

Codecov Report

:white_check_mark: All modified and coverable lines are covered by tests. :white_check_mark: Project coverage is 58.93%. Comparing base (f09f8af) to head (2884ed3). :warning: Report is 585 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #2521      +/-   ##
============================================
+ Coverage     56.12%   58.93%   +2.80%     
- Complexity      976     1449     +473     
============================================
  Files           119      147      +28     
  Lines         11743    13649    +1906     
  Branches       2251     2369     +118     
============================================
+ Hits           6591     8044    +1453     
- Misses         4012     4382     +370     
- Partials       1140     1223      +83     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov-commenter avatar Oct 03 '25 17:10 codecov-commenter

Updated example chart now that this is logging the correct info:

data stacked_cumulative

andygrove avatar Oct 03 '25 20:10 andygrove

moving to draft while I work on the Python scripts

andygrove avatar Oct 03 '25 20:10 andygrove

Still experimenting...

mem_chart

andygrove avatar Oct 03 '25 21:10 andygrove

Chart now shows when try_grow failed:

mem_chart

andygrove avatar Oct 03 '25 21:10 andygrove