C. Titus Brown

Results 983 comments of C. Titus Brown
trafficstars

> We can add an extra column to the manifest for "preferred"/"overwrite" name, and use that (if present) when outputting in `gather`/`search`/any other place reporting names. Manifest are easy to...

if we're going to upgrade manifest contents to include `deprecated/removed` and `preferred_name`, we could also: * think about deprecating `filename` in manifests, 'cause it's dumb * add missing fields like...

thanks @ezherman ;). I clearly need to do a sweep of the issues again, since we've updated the docs quite a bit!

per https://github.com/sourmash-bio/sourmash/issues/3504, this code: https://github.com/sourmash-bio/2025-sourmash-eukaryotic-databases/blob/main/Snakefile#L122 ```python rule lineages_csv: input: "collections/{NAME}.links.csv", output: "databases/{NAME}.lineages.csv", shell: """ scripts/taxid-to-lineages.taxonkit.py {input} -o {output} """ ``` was used to generate the lineages CSV file.

thanks for posting this, @mr-eyes :) * another pro is that we could also use a rolling hash function which could potentially be much more efficient! * we do (sort...

>* If Murmur is replaced by a rolling-based hashing, as per [[MRG] Rolling hash, attempt N dib-lab/khmer#1792](https://github.com/dib-lab/khmer/pull/1792) the performance would still not be that efficient compared to **xxHash**, so what's...

hah, khmer use(s/d) two-bit representations! I don't like the k-mer size limitation of 32 tho.

FYI this paper: [The K-mer File Format: a standardized and compact disk representation of sets of k-mers](https://academic.oup.com/bioinformatics/article/38/18/4423/6651834)

That code looks good to me! Note that the output is a text file with just the relevant hashes in it (and not their counts); if you need a different...

Let me see if this response helps - ask away if not :) when signatures are calculated with `sketch dna -p abund` the signature stores both a hash (representing a...