Expanding dereplicated reads
Is your feature request related to a problem? Please specify.
Not really a problem necessarily, but more of a question. Is there currently any way to use information for de-replicated reads when calculating depths via mpileup? The context is that I am working on a serverless pipeline for a de novo assembly of a reduced-representation dataset, with steps running in AWS lambda's, so to keep file sizes and runtimes as small as possible, I'm doing all of the assembly steps using dereplicated reads (with the read counts saved in the headers as ;size=X). Currently I am then re-expanding these to create an intermediate file to send through mpileup before running bcftools call, however I'm wondering if there is a way within samtools to use this count information directly for calculating depths?
Describe the solution you would like.
It would be ideal if samtools could directly parse the count information, for example via a user-defined tag, or by parsing the USEARCH/VSEARCH style size= tags from the headers. If not currently possible, I would be happy to work on it and submit a pull request, but would appreciate some pointers on the best way to proceed.
Thanks!
Sorry, I just noticed that the vcf outputs had been moved to bcftools mpileup, so I should have posted this over there..
Mm, I don't understand the terminology you are using (for example what do you mean by "de-replication", "re-expansion", the reference to USEARCH, etc) and don't know if you are referring to a BAM file header or to a VCF file header. Can you please provide more details.