diamond icon indicating copy to clipboard operation
diamond copied to clipboard

Metatranscriptomics breadth of coverage

Open mweberr opened this issue 3 years ago • 1 comments

Hi, I am using diamond blastx to map metatranscriptomic reads to a single species proteome (Salmonella enterica). Currently, I see many reads that cover only a tiny part of a target protein, because they are all aligned to the same 50 AA (~5% of total protein length).

In case of one query sequence one target sequence, this can be easily found by the calculation of target sequence coverage (alignment length / target sequence length). However, when you deal with 100.000 of reads, it is necessary to join all start-stop-intervals of each protein to compute the breadth of coverage for each protein.

What is the best way (best practice) to compute protein breadth of coverage after diamond blastx read mapping ?

Best, Michael

mweberr avatar Oct 18 '22 08:10 mweberr

I'm sure there's tools out there that can do this but I'm not the right person to ask, maybe a website like https://www.biostars.org/ can help you.

bbuchfink avatar Oct 28 '22 13:10 bbuchfink