BPCells icon indicating copy to clipboard operation
BPCells copied to clipboard

Ia/insertion radix sorting

Open immanuelazn opened this issue 1 year ago • 1 comments

Description

Previously we added a change for InsertionIterator to use a priority queue, as we were experiencing memory errors with the original vector + radixsort method. However, the priority queue solution is slower, so we planned to bring back the vector +radixsort solution once we determined the problem.

Tests

  • Ran tests in test-atac_utils.R which already had tests for writing bedfiles.
  • Used r/tests/real_data/ArchR_insertions.R which compares against insertion writing in ArchR, and results remained as we expected.
  • Ran and profiled using a filtered PBMC dataset of 2600 cells

Details

The problem ended up being pretty simple--just a mistake in the direction of arrows for a conditional within buffer resizing. Even without resizing within the nextChr() call, the iterator still functions at datasets of at least 2600 cells, which indicates that the solution now works. Memory profiling indicates that usage never spiked beyond 1 GB.

I also increased the amount of comments so it is a little easier to see the thought process on how loading fragments works

immanuelazn avatar Sep 27 '24 22:09 immanuelazn

To recap some discussion offline: I think the direction of comparison was right for the intended purpose -- i.e. resizing the end_data buffer if too small a fraction of it was used in order to avoid re-sorting the same buffer contents repeatedly.

Swapping the comparison direction prevented the over-zealous size increases, but could leave the buffer contents getting sorted many times.

I think the core problem was the comparison with end_data.size() was not the right way to figure out the amount of leftovers, since the logic wouldn't necessarily have filled data all the way out to end_data.size().

The new suggested logic will always fill out the full end_data buffer when possible to avoid this problem

bnprks avatar Oct 04 '24 21:10 bnprks

So I just checked your changes with pprof, htop and they seem to have fixed the issue at hand. Performance looks great at taking taking approximately 40% faster than original solution at all thread levels, and the memory issues are gone.

immanuelazn avatar Feb 01 '25 06:02 immanuelazn