vg
vg copied to clipboard
feature requests for vg filter and vg call
It's my understanding that when performing linear alignment of paired-end reads, people typically follow this workflow: (1) pre-process reads in FASTQ file (2) align reads to linear reference, generating a BAM (3) sort aligned reads in the BAM file (4) remove PCR duplicates (5) filter for properly paired reads that pass a certain MQ score (6) call variants
Would it be possible for the vg development team to add these two features:
(1) a feature that allows the user to remove potential PCR duplicates before running vg call
(equivalent to samtools rmdup
)
(2) a feature that allows the user to generate a VCF file (from vg call
) using only properly paired reads that pass a certain MQ score.
Hi @jeizenga, is this the proper way to make a feature request?
Yes, thanks! Sorry to be not-so-responsive, I've been tied up prepping a manuscript. I'm hoping to get it out the door pretty soon, at which point I'll have some more bandwidth to help out here.
Great, and no worries! I really appreciate the help, @jeizenga. Good luck with the manuscript.
Hi @jeizenga, would there happen to be any updates regarding these requests?
Hi, sorry to have been so non-responsive about this. I've looked into this a bit, and it seems like the implementation will be a bit more involved than I expected. However, I've brought the idea up within our group and there's general agreement that this would be a good addition to our tooling.
If you want to work on developing a pipeline in the interim, one option might be:
- Use
vg surject
to produce a BAM - Deduplicate the reads with Picard
- Convert the BAM into a FASTQ
- Re-map the de-duplicated reads
Alternatively, instead of remapping the reads, you could extract a list of the de-duplicated read names and use vg filter -N
to subset the GAM file down to the deduplicated reads. However, I expect that this could be pretty memory inefficient.