Metatranscriptomics breadth of coverage
Hi, I am using diamond blastx to map metatranscriptomic reads to a single species proteome (Salmonella enterica). Currently, I see many reads that cover only a tiny part of a target protein, because they are all aligned to the same 50 AA (~5% of total protein length).
In case of one query sequence one target sequence, this can be easily found by the calculation of target sequence coverage (alignment length / target sequence length). However, when you deal with 100.000 of reads, it is necessary to join all start-stop-intervals of each protein to compute the breadth of coverage for each protein.
What is the best way (best practice) to compute protein breadth of coverage after diamond blastx read mapping ?
Best, Michael
I'm sure there's tools out there that can do this but I'm not the right person to ask, maybe a website like https://www.biostars.org/ can help you.