Tessa Pierce Ward
Tessa Pierce Ward
Recent functions added to rust core: - `signature::minhash` - `signature::get_sketch` - `sketch::minhash::sum_abunds` - `sketch::minhash::n_unique_kmers`, - `sketch::minhash::inflate` - `sketch::minhash::inflated_abundances` - `ani_utils::ani_from_containment` - `ani_utils::ani_from_containment_ci` - `ani_utils::prob_nothing_in_common`
In https://github.com/sourmash-bio/sourmash/pull/2943, I calculate all gather statistics in a function, `calculate_gather_stats` and use it in `revindex::disk_revindex.rs`. `linear` gather works slightly differently, so we need a modified version of `calculate_gather_stats`. Context...
once we have #2943 and #3020, try moving calculations so that they can be done in parallel, outside of the main (serial) gather loop.
Now that we have `Select` implemented, deprecate and remove usage of rust `sig.select_sketch`, per https://github.com/sourmash-bio/sourmash/issues/1292
per @luizirber (paraphrased) -- we should find a replacement for `.sketches` as it clones all sketch data https://github.com/sourmash-bio/sourmash/blob/latest/src/core/src/signature.rs#L480-L482 ``` pub fn sketches(&self) -> Vec { self.signatures.clone() } ``` Note: `sig.iter()`...
`prefetch` flattens abundances by default when searching (intended). But it would be good to reinflate abundances before saving matched and unmatched sigs, to keep abundances around. Note that this means...
from efforts in https://github.com/sourmash-bio/sourmash_plugin_branchwater/pull/197 and https://github.com/sourmash-bio/sourmash/pull/2948 - consider allowing missing paths when loading collections and reporting the number missed or failed to the user. Force to allow failures? - Can...
`GCA_905332505.2` is part of `gtdb-rs207` (https://gtdb.ecogenomic.org/genome?gid=GCA_905332505.2), but has been suppressed (see https://www.ncbi.nlm.nih.gov/assembly/GCA_905332505.2). Genome/proteome download from NCBI fails (due to suppression). Since `wort` sketches files as they become available, I believe...
ref ANI estimation PR #1788 I've been using our forthcoming ANI utilities to estimate pairwise ANI between GTDB genomes. From these data, we can examine the average containment --> ANI...
With the updated rust core, we need to go through and rationalize (and unify) rust and python code so we don't, for example, create sigs/zips that break rust assumptions. from...