Qiyun Zhu comments

Results 99 comments of


                                            Qiyun Zhu

Coverage stop coordinate switch to exclusive

Hi @wasade Thanks for discussing with me the issue and suggesting the change. I think there are two things to consider: 1) Whether the right coordinate is inclusive or exclusive....

Coverage stop coordinate switch to exclusive

@wasade Thanks for the clarification and the example! You are correct that SAM doesn't encode for stop and one needs to calculate it from CIGAR. Let me think a bit...

Which parameters does the gotu command actually uses?

@antgonza Thanks for the insightful comments! `gotu` is a minimal subset of `classify`, i.e., no classification; just assign queries to subjects but not to higher classification units. So it does...

Replace instances of `count_list` with `collections.Counter`

Thanks for doing this rigorous benchmarking! Very helpful information! The function `count_list` is to group multiple subjects mapped to one query. In most situations there won't be many of them....

Replace instances of `count_list` with `collections.Counter`

I understand. `Counter` could potentially be very helpful in optimizing other parts of this program. We can track down to these other cases. If we have to use `Counter` in...

Replace instances of `count_list` with `collections.Counter`

@gwarmstrong I am going back to look at this question. In issue #37, I created a realistic test dataset, based on which I performed some survey and benchmarks. Now there...

Optimize cigar_to_lens function

Update: Because some CIGAR strings show up frequently (e.g., `150M`), one can use `functools.lru_cache` to significantly accelerate the calculation. Simply add a decorator: `@lru_cache(maxsize=128)` before the function. Benchmarks on a...

Dude, Where's My Tree?

Good point! The tree is a resource downloadable from the WoL website, but as you said the instruction is vague for users who don't want to go thru the entire...

Benchmarks of individual steps

@wasade Thank you for providing these valuable advices! I will look into individual suggestions and see if I can do something to increase the performance!

Enable parallel computing

Hi @wasade @adswafford This is correct. Woltka can totally be run separately on multiple subsets of samples and the resulting BIOM tables can be merged. No modification is needed for...