smudgeplot icon indicating copy to clipboard operation
smudgeplot copied to clipboard

Generalized the middle_one_away function to make parallelization possible

Open RNieuwenhuis opened this issue 4 years ago • 3 comments

As discussed in #65 I generalized the middle_one_away function to make parallelization possible. It is rebuilt as get_pairs_at_pos function and is now deterministic. Thus it contains a fix for #70 . Required was the implementation of a function aggregate, that is now also used for the all_one_away function.

Furthermore I made some stylistic changes that PyCharm was bugging about. Also included some more comments. Hope it improves readability for all.

Updated README accordingly.

TODO: A value error is raised but I think it is up to @KamilSJaron to decide on how to handle that. TODO: Check README if it needs any further changes. TODO: Find out why hetkmers all_one_away and hetkmers get_pairs_at_pos + hetkmers aggregate report different numbers of unique pairs. Manually inspecting a few pairs missing in either methods results did not yield any insight. Needs a more thorough and systematic inspection. Nevertheless, the outcomes are in the same ball park.

RNieuwenhuis avatar Aug 28 '20 13:08 RNieuwenhuis

That's cool, thanks for resolving the deterministic problem (I am still not sure if I understand where was the problem and furthermore, sorry for misleading you with the SA, I used them in a different module I wrote for mapping of kmers).

re 3rd TODO: I actually never checked if all and middle find all the kmer pairs at least for the middle nucleotide. I will try to get it in the minimal reproducible framework.

KamilSJaron avatar Aug 28 '20 14:08 KamilSJaron

Sorry, the timing is not the best for me. I am going to annual leave in a few days and I have quite a backlog of things I need to do. I am afraid I will get back to this PR in ~3 weeks the soonest.

KamilSJaron avatar Sep 07 '20 12:09 KamilSJaron

3 weeks? More like 3 years!

It was too big of a task for a single evening and I never managed to commit a longer stratch of time to this, I am very sorry. I went through the changes and they look really nice! I wish I would have appreciated more!!!

We actually returned to the development of smudgeplot, we have a working fully parallelised C version, which is the only reason why I won't integrate this pull request, but again, great job!

KamilSJaron avatar Aug 10 '23 15:08 KamilSJaron