blobtools icon indicating copy to clipboard operation
blobtools copied to clipboard

Tip: some jq code to get list of "good" contigs

Open kdm9 opened this issue 3 years ago • 0 comments

Hello,

This is mostly a PSA, as the following took me way to long to work out myself. Perhaps the authors could add this to the docs somewhere appropriate.

To filter a set of contigs based on the GC content and coverage (a la the blobplot), one can use the following jq command:

jq -r '.dict_of_blobs[] | select((.covs.bam0 > 10) and (.gc > 0.4)) | .name' \
    < path/to/something.blobDB.json \
    > goodcontigs.txt

Here, I use a coverage threshold of 10 in the first bam, and a minmum GC of 0.4. Obviously adjust these thresholds to your blobplot. Additional bams would be supported by adding something like (.covs.bam1 > 23) and within the select() function. The resulting goodcontigs.txt is a simple text list of contig names compatible with blobtools seqfilter.

Thanks for a great tool, K

kdm9 avatar Jun 29 '22 08:06 kdm9