Maintain Barcode in Output
When running deepvariant on single cell Pacbio longread data, is it possible to have the output include the barcode where the SNP was found?
hi @gneedle1 ,
Currently it is not possible to have the barcodes output in the VCF. Thank you for the request.
Would it be valid to subset the original BAM by barcode and run each subset individually?
Hi @gneedle1
It depends on the type of experiment. If the barcodes are the same sample and you are trying to get at some other specific property (e.g. cell type or preparation), then it's a question of sequencing coverage. If you will have enough coverage to make good quality calls within the reads of a single barcode (something like at least 15x-20x depending on your tolerance for errors), then subsetting by barcode could be reasonable. If you have less coverage, then the effects of reducing coverage will likely be much larger than whatever effect you are trying to detect.
If the barcodes separate different samples (i.e. those with different germline DNA), then the correct thing is to separate by barcode.
I would need a little more information about the nature of the samples and what you are looking for to give you a more direct opinion.
The experimental setup is roughly:
- Hairy cell leukemia cells were isolated from the blood of patients.
- Single-cell cDNAs were synthesized and barcoded by 10X Genomics platform (cDNAs of each cell are barcoded individually).
- Direct RNA seq will be run for the single-cell cDNAs from one patient on the flow cell of ONT sequencer.
- The goal is to call mutations for the cDNA in individual cells and then build single-cell phylogeny based on mutations.
I see, so in this case, each barcode corresponds to a specific cell. If you have sufficient coverage, you can split by barcode and make the call, it's really just a question of coverage. For your purpose, you can split by barcodes.
Thanks @AndrewCarroll for the follow-up answer. I will now close this issue.