dropEst
dropEst copied to clipboard
SPLiT-seq V3 round1 barcode collapse
Hello,
Does the current barcode search take into account the V3 protocol update which results in 2 distinct round 1 barcodes (position 86) in the same well? This would result in a given cell generating reads which have the same round 2 + round 3 barcode but 2 distinct round 1 barcodes.
In the v3 of the SPLiT-seq protocol, the barcoded RT polydT and random hexamer primers added in the same well do not have the same 8bp sequence.
Protocol: https://sites.google.com/uw.edu/splitseq/protocol?authuser=0 Sequence file: https://drive.google.com/file/d/1l3MkxYVWOPTV5KPpq2uXMLzKOKwVeZyw/view?usp=sharing
<protocol>split_seq</protocol>
<MultipleBarcodeSearch>
<barcode_starts>10 48 86</barcode_starts>
<barcode_lengths>8 8 8</barcode_lengths>
<umi_start>0</umi_start>
<umi_length>10</umi_length>
</MultipleBarcodeSearch>
There appears to be one attempt to handle this case here: https://github.com/paulranum11/SPLiT-Seq_demultiplexing/blob/master/Collapse_RanHex_Odt.sh
Just curious if this scenario is already handled in the dropEst SPLiT-seq processing shown above or additional corrections need to be taken.
Thank you!
Hello,
Thank you for the info! I will look at it in more details to give precise answer, but now it seems that you need to update the white-list of barcodes. Or, as a very quick fix, don't use white-list of barcodes at all (just comment "Estimation/Merge/barcodes_file" line in the config, if you have one).
Thank you, Viktor!
I have tried switching between these 2 flags without commenting out the barcodes file line in config:
-m, --merge-barcodes : merge linked cell tags
-M, --merge-barcodes-precise : use precise merge strategy (can be slow), recommended to use when the list of real barcodes is not available
Both flags resulted in an identical number of barcodes for test data sets used. Is commenting out the lines in the config also required?
Both these option use list of real barcode if the line is not commented. And in this case they produce very similar results. So, please try to comment the line, and in this case use-M, as it generally works better.
Hi Viktor,
I have the same concern here. In the current Split-seq protocol, the first round barcoded RT polydT and random hexamer primers were added in the same well. This will lead to that the same cell will have two different round 1 barcoding.
If we comment out the barcode file in the configure file, I guess the Dropest pipeline will try to sort cell by detected unique barcodes. If this is true, then as the cell have two distinct round 1 barcodes, the reads which actually belong to the same cell will separate into two cells. Could you help to clarify it?
Thanks!
Minjie
If two reads with different barcodes can belong to the same cell, dropEst will not deal with it properly and will split them into two cells. The easiest workaround would be to merge them afterwards, when the count matrix is obtained.
That's a good way to walk around.