C. Titus Brown issues

Results 624 issues of


                                            C. Titus Brown

paper: Simplitigs as an efficient and scalable representation of de Bruijn graphs

https://www.biorxiv.org/content/10.1101/2020.01.12.903443v2.full >Motivation De Bruijn graphs play an essential role in computational biology, facilitating rapid alignment-free comparison of genomic datasets as well as reconstruction of underlying genomic sequences. Subsequently, an important...

papers

can we refactor to use gfabase for graph storage?

https://twitter.com/dnamlin/status/1352545370702659587?s=21

`build` doesn't properly depend on input files

if you adjust the config file to include new `input_sequences`, `build` doesn't rebuild the catlas.

v2.0

Enable protein queries

per our last R01 submission, >We will implement amino acid and Dayhoff-6 queries with spacegraphcats by building secondary indices on top of the cDBG unitigs, which will allow us to...

index_cdbg_by_kmer needs optimization

spacegraphcats.cdbg.index_cdbg_by_kmer is both slow and memory intensive, sigh. speed may require more Cythonization or something. in terms of memory intensitivity, one problem right now is that we use Python sets...

include estimator of number of unique k-mers in cDBG?

one of the main challenges right now is estimating memory requirements for cDBG/k-mer indexing stage. I think we could figure out how much memory to provide to sgc if we...

Refactor extract_contigs and/or merge with extract_contigs_cdbg

`spacegraphcats/search/extract_contigs` iterates over all contigs, rather than using `search_utils.get_contigs_by_cdbg_sqlite`. So that's bad. `extract_contigs_cdbg` may do the thing correctly. TODO: Dig into this, figure out the right solution, refactor, win.

C. Titus Brown