dropSeqPipe icon indicating copy to clipboard operation
dropSeqPipe copied to clipboard

SureCell / ddSeq support

Open AskPascal opened this issue 5 years ago • 7 comments

Hej,

I just got some data, generated with SureCell libraries on a ddSeq machine (i.e. the protocol by Illumina and Bio-Rad). I would like to test your pipeline for the analysis but I'm not sure if it can be used and if so how to fill the config.yaml. Barcodes are in Read 1, however, they are not at a fixed position, and the cell barcode is split into three parts by spacer sequences:

single-cell-rna-algorithm-tech-note-1070-2016-015

Below is a small example from the first read fastq file of one of my samples.

Is it possible to process this data with dropSeqPipe?

Cheers


@D00457:259:HKWJNBCX2:1:1105:1128:2079 1:N:0:CCTAAGAC
CTCGGCGTTAGCCATCGCATTGCGGATTGTACCTCTGAGCTGAATCGCCTACGTCCCCGGAGACCNNT
+
<DDD0<CFHHHIIIIIIIIIIIIIIIHIHIGHHHIHHHGHFHHHIHHHIIIIHIIIIIEHHHIII##<
@D00457:259:HKWJNBCX2:1:1105:1168:2089 1:N:0:CCTAAGAC
AATGGAGTAGCCATCGCATTGCACCTTCTACCTCTGAGCTGAAGAAATAACGCCTACGAAGACTTNNT
+
<<<D01<<D1ECH?F0=CEE?<1DG@<1CGEH@HHHHIIHGEGCGEHFHIHGHHHHHIEHHHHEF##<
@D00457:259:HKWJNBCX2:1:1105:1122:2104 1:N:0:CCTAAGAC
ACCCAATAGCCATCGCATTGCCCGTAATACCTCTGAGCTGAATAAGCTACGAAACTGTGGACTTTNNT
+
0<DDDIHHIIEEHHGHIIEHIFDGHHHIIIHIIIH?GHHIIH1<FH1FGHIGHIIHIFHIHE@FH##<
@D00457:259:HKWJNBCX2:1:1105:1102:2126 1:N:0:CCTAAGAC
TTCGTAGAGGTAGCCATCGCATTGCTGAGACTACCTCTGAGCTGAACTCAATACGCTTCGAGCGANNT
+
0<<DBDHHHFCFHEGHIHIHIIIIHHIHGEHIHHIHIHIHI?1<1GHHIHIIIIIGIIGHHGHIH##<
@D00457:259:HKWJNBCX2:1:1105:1158:2127 1:N:0:CCTAAGAC
ACATAGATAGCCATCGCATTGCTAATAGTACCTCTGAGCTGAAGCGAATACGTCCCCCCTGACTTNNT
+
@@B@0<CEGHIIHHI=GEEHCGHEHHEEHHIHFHCHEHCHIHIIHIHIIHHHHI0EHHIII?@1<##<

AskPascal avatar Aug 09 '18 12:08 AskPascal

Hello @pascal-git

As of now, it is definitely not compatible. The split barcode pattern is not the big issue here, you could give those positional arguments in and it should work. Although I haven't tried it.

The main issue is the shift in base on the first read.

Right now the barcodes are picked by given position of the bases in R1, so it can't be shifted. One way to overcome this would be to first "deshift" R1, then run dropSeqPipe.

Although I don't have the time to try it out now, it might be a good idea to change the way I find the barcode and umi and use a similar idea to umi-tools which I recommend you try out.

This construct seems overly complicated though, would you know what are the advantages over 10x for example?

Hoohm avatar Aug 09 '18 14:08 Hoohm

Hej @Hoohm

No, I am also still wondering what would be better with this approach than the 10x way. To not have all barcodes at the same position might hedge for systemic biases in sequencing maybe? Or its just a intellectual property thing...

Thanks for clarifying what the problem would be to get the data into dropSeqPipe and how to potentially solve it. umi-tools for sure looks interesting. I found however another tool yesterday: umis, which has even example code for SureCell / ddSeq available and in my preliminary tests it looks promising. I might use it in combination with dropSeqPipe or just as standalone...

AskPascal avatar Aug 10 '18 07:08 AskPascal

Hey @pascal-git I've come up with a small script that should be able to handle funky barcode structures.

You can check it out here

I'm working on a new version of dropSeqPipe (see develop branch) which is going to use cutadapt instead of trimmomatic. The main reason was to add adapter presence in R1 and R2 instead of just R2 trimmed as it is now.

To do this, I'm also changing a lot in the filtering. I'm trimming R1 and R2 separately and repair them after trimming. This cuts down running time as well as give more insight into the potential problem with the protocol.

Since I'm not depending on dropseq tools for this first part anymore, I'm capturing barcodes differently. This would make it easier for me to allow for fancy barcode structure.

So, keep checking, your protocol might be compatible in one month or so.

Hoohm avatar Aug 10 '18 15:08 Hoohm

Hej @Hoohm

This is really interesting. I'll keep my eyes open for the new version then!

AskPascal avatar Aug 13 '18 08:08 AskPascal

As you can see, this is not implemented yet at all.

I sadly I haven't found the time to work on it since this is not some technology we use.

I hope someone could help out on for integrating a universal cell barcode structure module

Hoohm avatar Nov 28 '18 13:11 Hoohm

I've written a sed solution to extract the barcodes and UMI from R1. This will return a Read1 with an 18bp barcode and 8bp UMI.

Read1s=("Sample_S1_L001_R1_001.fastq" "Sample_S1_L002_R1_001.fastq")
Read2s=("Sample_S1_L001_R2_001.fastq" "Sample_S1_L002_R2_001.fastq")

    #remove adapter from SureCell (and correct phase blocks)
        for File in "${Read1s[@]}"; do
            #remove phase blocks and linkers
            sed -E '
                /.*(.{6})TAGCCATCGCATTGC(.{6})TACCTCTGAGCTGAA(.{6})ACG(.{8})GAC/ {
                s/.*(.{6})TAGCCATCGCATTGC(.{6})TACCTCTGAGCTGAA(.{6})ACG(.{8})GAC.*/\1\2\3\4/g
                n
                n
                s/.*(.{6}).{15}(.{6}).{15}(.{6}).{3}(.{8}).{3}/\1\2\3\4/g
                }' $File > .temp
            mv $.temp $File
        done

TomKellyGenetics avatar Mar 19 '20 02:03 TomKellyGenetics

Thank you @TomKellyGenetics !

I'm gonna add this to the documentation :)

Hoohm avatar Mar 21 '20 15:03 Hoohm