smudgeplot
smudgeplot copied to clipboard
Generalized the middle_one_away function to make parallelization possible
As discussed in #65 I generalized the middle_one_away
function to make parallelization possible.
It is rebuilt as get_pairs_at_pos
function and is now deterministic. Thus it contains a fix for #70 .
Required was the implementation of a function aggregate
, that is now also used for the all_one_away
function.
Furthermore I made some stylistic changes that PyCharm was bugging about. Also included some more comments. Hope it improves readability for all.
Updated README accordingly.
TODO: A value error is raised but I think it is up to @KamilSJaron to decide on how to handle that.
TODO: Check README if it needs any further changes.
TODO: Find out why hetkmers all_one_away
and hetkmers get_pairs_at_pos
+ hetkmers aggregate
report different numbers of unique pairs. Manually inspecting a few pairs missing in either methods results did not yield any insight. Needs a more thorough and systematic inspection. Nevertheless, the outcomes are in the same ball park.
That's cool, thanks for resolving the deterministic problem (I am still not sure if I understand where was the problem and furthermore, sorry for misleading you with the SA, I used them in a different module I wrote for mapping of kmers).
re 3rd TODO: I actually never checked if all and middle find all the kmer pairs at least for the middle nucleotide. I will try to get it in the minimal reproducible framework.
Sorry, the timing is not the best for me. I am going to annual leave in a few days and I have quite a backlog of things I need to do. I am afraid I will get back to this PR in ~3 weeks the soonest.
3 weeks? More like 3 years!
It was too big of a task for a single evening and I never managed to commit a longer stratch of time to this, I am very sorry. I went through the changes and they look really nice! I wish I would have appreciated more!!!
We actually returned to the development of smudgeplot, we have a working fully parallelised C version, which is the only reason why I won't integrate this pull request, but again, great job!