dropEst icon indicating copy to clipboard operation
dropEst copied to clipboard

Support for ddSEQ?

Open driscolc opened this issue 7 years ago • 1 comments
trafficstars

Hello,

We have some ddSEQ data, and were interested in applying an open-source tool to generate counts from our reads. I'm assuming we would need a list of barcodes, but in theory could dropEST work for this type of droplet-based data? The library preps are organized like in the following link:

https://teichlab.github.io/scg_lib_structs/SureCell.html

driscolc avatar Jul 23 '18 18:07 driscolc

Hello,

Thank you for looking at our tool. Indeed, it's possible to adopt the dropEst for this protocol, but it requires to write some code (changing config wouldn't be enough). We're going to write a guide on how to extend dropEst for new protocols, but right now we don't have it. There are two options for the extending. If you decide either of them, please don't hesitate to contact me, I'll do my best to help. And if you do that, it would be really great if you share your results with the other.

Option 1 (c++)

First, you can write new c++ class, which inherits TagsFinderBase ("TagsSearch/TagsFinderBase.cpp"). There are several examples for existed protocols (see all classes with suffix "TagsFinder"). All you really need there is to define function parse_fastq_record. After that you need to add new protocol type to function get_tags_finder in "droptag.cpp". It takes only several lines:

if (protocol_type == "ddSEQ")
    return ...

Having this your are able to set "TagsSearch/protocol" to "ddSEQ" in the config.xml and work with ddSEQ as with any other supported protocol in very efficient and parallel manner.

Option 2 (python)

If droptag performance is not crucial for you, you can write simple python script, which parses your fastq files and creates output, compatible with droptag. In this case you can just pass it to dropEst and have all its benefits on your protocol. Output of the droptag has the next format:

  1. Fastq file (or files) with gene reads and some unique read id
  2. Text file (possibly gzipped) with information about barcodes in the next format: "read_id cell_barcode umi barcode_quality umi_quality" barcode_quality and umi_quality are optional and can be omitted.

Example of the first file:

@1066468233L1 ACTATNAGACCTGCACCCGACACCCATCTCGTATGCCGTGTTGTNAGTGAAANAAAANAAC + AAAAA#EEE/AEEEEEEEEAE/EEEE/A/E<////AE///////#//E//EE#AEA/#/// @1066468233L2 CCTTANCACAAAGAGACGAATGCGCCTGCACCCGACACCCATCTCGTATGCCNTAGTGGGC + A/AA6#//AA/AE/EEAE/EEEAEEEEEEEEEEEAEEEEE///<EEA////E#//////E/ @1066468233L3 GTGGTNTGCTGTTGAGCTTGTAATGTGAAAAACAACTTAGAAATAAATTGTCNTTATATCT + 66/AA#<EEE/EEEEEEEEEE///E/EEAEEEEAEAEE//AEAE/EE///E/#EEEE/EE<

Example of the seconf file:

@1066468233L1 GGGGGGGGGGGGGGGG NGGNNG AAAAAEEEA////E// #//##E @1066468233L2 GGGGGGGGGGGGGGGG GGGNNG AAAAAEEEAAAAAEEE EEE##E @1066468233L3 GAGGAGTGGCCACATC AATNNC A6/AAEEEAAAAAAEE E/E##E

VPetukhov avatar Jul 27 '18 15:07 VPetukhov