Qiyun Zhu

Results 69 issues of Qiyun Zhu

Hello @amnona and @RNAer , in the past days I tested calour. It is an awesome piece of work! I came up with a few comments. I am going to...

Hi @wasade I was able to set up a development environment with NumPy 2.0, following the instruction provided in #955 . Now as I run `python -m unittest`, there are...

@rgommers We still got an error even with NumPy 2.0-compatible h5py and Pandas installed. Could you please help? How to reproduce the error: ``` git clone https://github.com/biocore/biom-format.git cd biom-format conda...

As my recent benchmarks (issues #40, #38) suggest, a notable proportion of Woltka's runtime are on the CPU / memory side, in addition to file I/O. @adswafford once asked if...

enhancement

The coverage files are big. So I made some tweaks to reduce its size while maintaining human readability. Previously, the file is like: ``` G1 1 120 G1 145 180...

The frequency of merging ranges is currently hard-coded (merging is triggered after every 10,000 new ranges are added). This step is very expensive (runtime is 50%+ of the entire Woltka...

enhancement

It is possible to add the following function, which generates a feature table, in which a feature ID contains both genome and gene information, like `G000123456|789` (the 789th gene of...

enhancement

In the ambiguous assignment mode, for each query sequence, each classification unit is counted as 1 / k, where k is the total number of classification units assigned to this...

documentation

Currently Woltka uses Python's built-in `functools.lru_cache` (least recently used, [LRU](https://en.wikipedia.org/wiki/Cache_replacement_policies#Least_recently_used_(LRU))) to store recent classification results. However, because the frequencies of classification results are evenly distributed across each sample, the alternative...

enhancement

Following @wasade 's suggestions in PR #50 as well as other thoughts, I tested multiple options of the `find_lca` function. Benchmarks were performed on a Bowtie2 alignment file of 100,000...

documentation