C. Titus Brown
C. Titus Brown
https://www.biorxiv.org/content/10.1101/2020.01.12.903443v2.full >Motivation De Bruijn graphs play an essential role in computational biology, facilitating rapid alignment-free comparison of genomic datasets as well as reconstruction of underlying genomic sequences. Subsequently, an important...
https://twitter.com/dnamlin/status/1352545370702659587?s=21
if you adjust the config file to include new `input_sequences`, `build` doesn't rebuild the catlas.
per our last R01 submission, >We will implement amino acid and Dayhoff-6 queries with spacegraphcats by building secondary indices on top of the cDBG unitigs, which will allow us to...
spacegraphcats.cdbg.index_cdbg_by_kmer is both slow and memory intensive, sigh. speed may require more Cythonization or something. in terms of memory intensitivity, one problem right now is that we use Python sets...
one of the main challenges right now is estimating memory requirements for cDBG/k-mer indexing stage. I think we could figure out how much memory to provide to sgc if we...
`spacegraphcats/search/extract_contigs` iterates over all contigs, rather than using `search_utils.get_contigs_by_cdbg_sqlite`. So that's bad. `extract_contigs_cdbg` may do the thing correctly. TODO: Dig into this, figure out the right solution, refactor, win.
``` Version: 2.0b9 conf/hu-s1.json build 295 min / 69078.9 MB conf/hu-s1.json bcalm_catlas_sort 35 min / 12607.9 MB conf/hu-s1.json bcalm_catlas_prepare_input 10 min / 12192.0 MB conf/hu-s1.json search 33 min / 23491.8...
extract_reads is still pretty slow. it'd be interesting to actually organize reads by cDBG ID, or something. another idea would be to involve dominators somehow, since they are (now) relatively...
this step in `bcalm_to_gxt2` could be split out into its own script, which would make it possible for snakemake to run it in parallel. might also consider renaming `bcalm_to_gxt2` 🤔