tskit icon indicating copy to clipboard operation
tskit copied to clipboard

Mock-up of ibd-segments trash collection method.

Open gtsambos opened this issue 3 years ago • 5 comments

Description

This is a Python mockup of some functions that might help to keep the internal memory requirements of ibd_segments in check. Would work similarly for link_ancestors (and perhaps also for simplify, but with smaller gains). See #2461 for more details. I'm aware that I'm really pumping out these PRs and issues at the moment 😅 -- this one is more of a long-term project and not urgent for anyone to look at

Before taking this any further, there are a few things we'll need to figure out:

  1. (under what conditions) does this actually improve memory usage? I need to do some profiling of these functions.
  2. what are the costs to runtime? I suspect this will really depend on how frequently we run these methods -- I've included a trash_interval parameter here that controls this.
  3. will this implementation translate well into C code? (I might need @jeromekelleher's insight here sometime in the future)

Fixes #2461

PR Checklist:

  • [ ] Tests that fully cover new/changed functionality.
  • [ ] Documentation including tutorial content if appropriate.
  • [ ] Changelogs, if there are API changes.

gtsambos avatar Aug 10 '22 13:08 gtsambos

I don't think this is the way we'd go about it - better to either write an allocator that we can free object as we are done with them, or to just use malloc directly (but that would need some serious code scrutiny to make sure we do the right thing in all situations).

Maybe you could try doing the malloc/free way and as a quick prototype and see how it works?

jeromekelleher avatar Aug 10 '22 13:08 jeromekelleher

Codecov Report

Merging #2462 (5f29107) into main (d11bae9) will not change coverage. The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #2462   +/-   ##
=======================================
  Coverage   93.91%   93.91%           
=======================================
  Files          28       28           
  Lines       27213    27213           
  Branches     1261     1261           
=======================================
  Hits        25556    25556           
  Misses       1623     1623           
  Partials       34       34           
Flag Coverage Δ
c-tests 92.24% <ø> (ø)
lwt-tests 89.05% <ø> (ø)
python-c-tests 71.69% <ø> (ø)
python-tests 98.96% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.


Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update d11bae9...5f29107. Read the comment docs.

codecov[bot] avatar Aug 10 '22 13:08 codecov[bot]

I'm finding it hard to prove or disprove the memory gains of these changes -- the memory usage is dominated by other objects that aren't directly comparable to the C implementation. In ibd.py it's the output, and in simplify.py its the list all_edges that we create by turning ts.edges() into a list. Also, scalene only seems to notice the lines of code that increase memory -- there may be more advanced options that I haven't noticed yet, though. I'm going to think about other, more relative ways to quantify this performance tomorrow.

gtsambos avatar Sep 15 '22 12:09 gtsambos

Maybe you could try doing the malloc/free way and as a quick prototype and see how it works?

I think this would give a lot of insight - looking at this in Python won't really tell you much.

jeromekelleher avatar Sep 15 '22 13:09 jeromekelleher

Ah sorry I misunderstood, I thought you just meant to reorganise the code so that it reflects how the malloc/free method would work in C. I'll get cracking on that.

gtsambos avatar Sep 15 '22 22:09 gtsambos

Hi @gtsambos, is this PR still something you're working on?

benjeffery avatar Jan 05 '23 13:01 benjeffery

Hi Ben! It would be fine to close this PR -- the changes that Jerome mentions above are totally different to what I've done, so maybe better for a different pr. I'm just working on this all rather slowly as I've been moving around a lot since I last commented here!

On Thu, Jan 5, 2023, 5:35 AM Ben Jeffery @.***> wrote:

Hi @gtsambos https://github.com/gtsambos, is this PR still something you're working on?

— Reply to this email directly, view it on GitHub https://github.com/tskit-dev/tskit/pull/2462#issuecomment-1372222929, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEHOXQVKV3T67GQPA4CGTDTWQ3EZNANCNFSM56EOYEEQ . You are receiving this because you were mentioned.Message ID: @.***>

gtsambos avatar Jan 06 '23 08:01 gtsambos

Thanks @gtsambos! I was just having a little spring-clean so no worries.

benjeffery avatar Jan 06 '23 11:01 benjeffery