warp icon indicating copy to clipboard operation
warp copied to clipboard

Fix Explicit release of temporaries due to ref-cycle in fem.borrow_temporary

Open Cucchi01 opened this issue 1 month ago • 7 comments

Context / Problem

fem.borrow_temporary() gives back warp.cache.Temporary objects whose array attribute self-references the temporary. If we simply drop local references, the buffers keep a ref-cycle and only return to the pool when Python’s GC runs, contributing to situation like this issue in the related Newton repository newton-physics/newton#1042

Possible alternatives in my opinion

  1. Wrap temporary.array in a weakref.proxy.

    • Pro: prevents the ref-cycle entirely; buffers go home as soon as user pointers die.
    • Con: not backward compatible. Any code that stores temp.array or passes it to kernels now receives a proxy, so wp.launch can’t infer types (weakref.ProxyType shows up as the dtype).
  2. Patch wp.array itself (inside Warp) so array becomes a property and the temporary installs weakref-backed release/detach.

    • Pro: fixes the GC issue globally.
    • Con: still changes the semantics of every wp.array: every array-like would be affected even if they never touch fem.borrow_temporary.
  3. Explicit release/detach in the whole codebase.

    • Keep the current Temporary API but audit the Warp library to release or detach buffers.
    • Benefits: preserves user-facing semantics, keeps changes localized to Warp internals, and makes ownership obvious.
    • Drawback: requires touching every call site that borrows or stores a temporary.

Notes

  • Whenever we assign self.attr = fem.borrow_temporary(...), in my changes the old buffer is released first. Even in situation in which the temporary could be returned externally, e.g.:
    def fill_arg(self, args: Arg, device):
        args.cell_particle_offsets = self._cell_particle_offsets.to(device)
        args.cell_particle_indices = self._cell_particle_indices.to(device)
        args.particle_fraction = self._particle_fraction.to(device)
        args.particle_coords = self.particle_coords.to(device)

Please correct me if this is not the expected behavior.

Still pending

  • Examples and any remaining fem.integrate() call sites outside the core solver still need to be changed. I'm going to do a follow-up commit to cover those.
  • Once that pass is done I’ll update this PR; for now the staged changes reflect the core library fixes that motivated the draft.

Before your PR is "Ready for review"

  • [X] All commits are signed-off to indicate that your contribution adheres to the Developer Certificate of Origin requirements
  • [ ] Necessary tests have been added
  • [ ] Documentation is up-to-date
  • [ ] Auto-generated files modified by compiling Warp and building the documentation have been updated (e.g. __init__.pyi, functions.rst)
  • [ ] Code passes formatting and linting checks with pre-commit run -a

Summary by CodeRabbit

  • Bug Fixes

    • Improved memory management throughout the finite element module by releasing temporary resources more promptly after use, reducing memory retention and preventing potential leaks during object lifecycle transitions.
  • Performance

    • Enhanced resource efficiency by implementing systematic cleanup of intermediate computational data structures during rebuilds and operations, resulting in more deterministic memory deallocation.

Cucchi01 avatar Nov 09 '25 20:11 Cucchi01

📝 Walkthrough

Walkthrough

The pull request implements comprehensive lifecycle management for temporary buffers and arrays across FEM modules. It adds destructors, ownership tracking mechanisms, and explicit release calls to ensure timely deallocation of temporary resources and prevent memory retention.

Changes

Cohort / File(s) Summary
Geometry classes — Lifecycle management for owned temporaries
warp/_src/fem/geometry/hexmesh.py, warp/_src/fem/geometry/tetmesh.py, warp/_src/fem/geometry/quadmesh.py, warp/_src/fem/geometry/trimesh.py
Added __del__, _replace_owned_array, and _release_owned_temporaries methods to manage lifecycle of owned temporary arrays (vertex offsets/indices, boundary face/edge indices). Updated topology rebuild flows to use _replace_owned_array instead of direct assignments.
Nanogrid base classes — Temporary buffer cleanup
warp/_src/fem/geometry/nanogrid.py, warp/_src/fem/geometry/adaptive_nanogrid.py
Added destructor and helper methods (_replace_boundary_face_indices, _release_owned_temporaries) for owned temporary management. Replaced direct assignments with safe replacements. Added explicit release calls for grid temporaries (cell_nodes, cell_faces, cell_edges).
Partition classes — Side-index lifecycle
warp/_src/fem/geometry/partition.py, warp/_src/fem/space/partition.py
Introduced _release_owned_temporaries, _replace_owned_array, and destructors. Added _set_partition_side_arrays helper for atomic array replacement. Updated side-index and cell assignment flows to use managed replacements instead of direct assignments.
Space restriction — Owned buffer management
warp/_src/fem/space/restriction.py
Added _replace_owned_temporary utility and _release_owned_temporaries with destructor. Updated rebuild flow to manage dof-related temporaries and node count synchronization with explicit releases.
Integration — Temporary array cleanup
warp/_src/fem/integrate.py
Added _release_temporary helper function. Added explicit release of accumulate arrays and triplet temporaries in constant-form and bilinear interpolation paths.
Quadrature — PIC temporary lifecycle
warp/_src/fem/quadrature/pic_quadrature.py
Introduced owned temporary storage for cell particle binning (_cell_particle_offsets, _cell_particle_indices, etc.). Added destructor and lifecycle helpers. Updated bin_particles and compress paths to manage temporaries via _replace_owned_array.
Domain — Element index ownership tracking
warp/_src/fem/domain.py
Added _owns_element_indices attribute and destructor to Subdomain. Now tracks whether element_indices are owned and releases them only when appropriate.
Adaptivity — Buffer release calls
warp/_src/fem/adaptivity.py
Added explicit release() calls for temporary grids and buffers (grid_voxels, merged_ijks, cell_ijk, cell_level) at appropriate points in adaptive grid refinement loops and after early breaks. Introduced previous-cycle aliases for proper resource supersession.
Utilities — Ownership tracking in compression
warp/_src/fem/utils.py
Added ownership flags (owns_unique_node_indices, owns_unique_node_count) in compress_node_indices to conditionally allocate and release unique node arrays based on ownership semantics.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

  • Scope: 14+ files affected across geometry, space, integration, and domain modules with consistent lifecycle-management pattern
  • Heterogeneity: While following the same structural pattern (destructor + helper methods + call-site updates), each file manages distinct owned temporaries with context-specific release semantics requiring individual verification
  • Logic density: Multiple call sites per file where _replace_owned_array and explicit release() calls are inserted; ownership tracking logic in a few cases
  • Areas requiring extra attention:
    • Verify no double-releases or premature releases of temporaries across different code paths (especially in loops and early-return branches)
    • Confirm ownership semantics are correctly assigned in each class (e.g., which temporaries are truly "owned" and which are borrowed)
    • Check that _replace_owned_array calls handle None replacements correctly and don't introduce reference cycles
    • Partition classes have atomic multi-array replacement logic (_set_partition_side_arrays) requiring care to avoid partial state corruption
    • Integration and quadrature modules have dense temporary management in multiple interpolation/binning paths

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 15.63% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically identifies the main change: fixing explicit release of temporaries due to reference cycles in fem.borrow_temporary. It accurately reflects the primary objective of the PR across the modified files.
✨ Finishing touches
  • [ ] 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • [ ] Create PR with unit tests
  • [ ] Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

coderabbitai[bot] avatar Nov 09 '25 20:11 coderabbitai[bot]

Thanks @Cucchi01 for this thorough PR! I 'll have a closer look a the changes.

In the meantime: here's a branch https://github.com/gdaviet/warp/tree/fix/temporary_self_reference that marks the .array access as deprecated and uses weakref for release/detach (I have a corresponding internal PR). This is similar to your 2) suggestion, since the attribute will be removed soon I think that it is the best workaround in the meantime. These changes reduce the memory increase by quite a bit in your mpm example, though it is still slightly increasing between GC runs.

Edit: I have switched to a less intrusive construct, using an array view. Downside is that we lose the deprecation warning, but at least it does not affect other array instances

gdaviet avatar Nov 10 '25 09:11 gdaviet

Whenever we assign self.attr = fem.borrow_temporary(...), in my changes the old buffer is released first. Even in situation in which the temporary could be returned externally, e.g.

Yes, in this case releasing the old temporary first should be fine (there is not reason for the struct argument lifetime to exceed that of the class attribute), though in some situations I would still prefer to have the garbage collector be in charge of the release (this was the main reason for the refactor of the Temporary object, as the old way of having separate temporary and arrays was creating too much logic related external accesses). Hopefully with the removal of the self-reference in the above branch we can get away with not doing explicit releases everywhere.

gdaviet avatar Nov 10 '25 09:11 gdaviet

Perfect, then I am going to use your private branch for the moment @gdaviet. Thank you!

Cucchi01 avatar Nov 10 '25 12:11 Cucchi01

When investigating this I actually found a couple more ref cycles. The fixes are now merged back to the main branch, and the memory usage is now much more stable!

step    200 cpu=   606.53 MiB gpu=  1896.00 MiB
step    300 cpu=   606.67 MiB gpu=  1896.00 MiB
step    400 cpu=   606.81 MiB gpu=  1928.00 MiB
step    500 cpu=   606.81 MiB gpu=  1928.00 MiB
[...]
step   6200 cpu=   606.81 MiB gpu=  1896.00 MiB
step   6300 cpu=   606.81 MiB gpu=  1896.00 MiB
step   6400 cpu=   606.81 MiB gpu=  1896.00 MiB
step   6500 cpu=   606.81 MiB gpu=  1928.00 MiB

gdaviet avatar Nov 10 '25 21:11 gdaviet

Great! Thank you! Then let me know if I can simply close this PR and if I should keep open the one in Newton (LINK) @gdaviet

Cucchi01 avatar Nov 11 '25 08:11 Cucchi01

Great! Thank you! Then let me know if I can simply close this PR and if I should keep open the one in Newton (LINK) @gdaviet

Let's keep those open for now if that is ok with you. Explicit release is still a plus, I'll pull in your commits so you can get properly credited and adjust where relevant

gdaviet avatar Nov 12 '25 13:11 gdaviet