dropEst icon indicating copy to clipboard operation
dropEst copied to clipboard

Library Tag for indrop v3

Open jodejonghe opened this issue 5 years ago • 8 comments

Hi lovely people from dopEst. I was wondering the feature for running dopest on indrop v3 on specific library tags was tested more thoroughly and is indeed functional? In the manual you specify:

"If a file with library tags provided, option "-t" is required. This option wasn't tested properly, so it's better to avoid using it."

Highly appreciate any input!

jodejonghe avatar Dec 14 '18 11:12 jodejonghe

Hi!

No, sorry. We still considering either to remove it or to completely rewrite.

VPetukhov avatar Dec 15 '18 16:12 VPetukhov

Hi, Thanks for the answer. So which strategy do you employ at the moment? Demultiplexing after bcl2fastq on R4 and then individually process samples through dropest?

jodejonghe avatar Dec 17 '18 11:12 jodejonghe

Hi @VPetukhov, Has there been any developments on this? What do you recommend as the best way to deal with v3 indrop fq files?

mbassalbioinformatics avatar Jul 17 '19 14:07 mbassalbioinformatics

Hi @VPetukhov, Thanks for developing dropEst! I used to process my inDrop V3 data using this pipeline https://github.com/indrops/indrops but I am facing some issues regarding how I can process my data with dropEst. I am planning to use the velocyto package. The pipeline I mentioned takes as input fastq files generated by bcl2fastq (R1-R4, 16 fastq files in total). Is there any more specific directions regarding how to use dropEst with inDrop V3? Thanks in advance!

edroaldo avatar Oct 01 '19 19:10 edroaldo

Hi, I am not the developer but I have been working with dropEst with inDrop V3 dataset, and I think for dropTag, there is in dropEst/configs/indrop_v3_with_barcodes.xml that could be used.

Simply, use the same bcl2fastq output, but then run dropTag, and then map it, then run dropEst as usual (although for velocyto for my particular issue, it has been suggested to consider the directionality to deal with nested genes).

For what is worth, I pasted the script I use to process my data via dropEst for velocyto package. One could run individually via substituting SLURM_ARRAY_TASK_ID to the appropriate library number.

#   libraries : 
#      - {library_name: "lib21_R1",         library_index: "GAGACGGA"}
#      - {library_name: "lib22_K2",         library_index: "AGAAAGCT"}
#      - {library_name: "lib23_R2",         library_index: "ACGCTCTT"}
#      - {library_name: "lib24_K1",         library_index: "CGCATTCT"}


declare -A library_tag=( ["21"]="GAGACGGA" ["22"]="AGAAAGCT" ["23"]="ACGCTCTT" ["24"]="CGCATTCT" )

module load gcc/6.2.0
module load boost/1.62.0
module load bamtools/2.4.1
module load R/3.4.1

~/dropEst/droptag -c ~/dropEst/configs/indrop_v3_with_barcodes.xml -S -s -l dlib${SLURM_ARRAY_TASK_ID}_FC04733 -n drop/dlib${SLURM_ARRAY_TASK_ID}.FC04733.L001 -p 8 -t ${library_tag[${SLURM_ARRAY_TASK_ID}]} \
      ../Data/Intensities/BaseCalls/Undetermined_S0_L001_R2_001.fastq.gz \
      ../Data/Intensities/BaseCalls/Undetermined_S0_L001_R4_001.fastq.gz \
      ../Data/Intensities/BaseCalls/Undetermined_S0_L001_R1_001.fastq.gz \
      ../Data/Intensities/BaseCalls/Undetermined_S0_L001_R3_001.fastq.gz 

Then mapping:

ls drop/dlib${SLURM_ARRAY_TASK_ID}*.fastq.gz | paste -sd "," - | gawk '{ print "STAR --genomeDir drop.reference.wo.pseudo --runThreadN 4 --readFilesIn " $0 " --outFileNamePrefix drop/dlib${SLURM_ARRAY_TASK_ID}.out/ --readFilesCommand zcat --outSAMmultNmax 1 --outSAMunmapped Within --outSAMtype BAM SortedByCoordinate"}' | bash

Then running the dropEst to obtain the matrices:

declare params_file=$(ls drop/dlib${SLURM_ARRAY_TASK_ID}*.params.gz | paste -sd " " -)

~/dropEst/dropest -m -V \
    -r "${params_file}" \
    -L eiEIBA \
    -g drop.reference.wo.pseudo/mm10.with.compact.transgenes.filtered.20180315.gtf \
    -o drop/dlib${SLURM_ARRAY_TASK_ID}.out/cell.counts.bcconfig.rds \
    -c ~/dropEst/configs/indrop_v3_with_barcodes.xml \
    drop/dlib${SLURM_ARRAY_TASK_ID}.out/Aligned.sortedByCoord.out.bam

chlee-tabin avatar Oct 01 '19 20:10 chlee-tabin

Thank you for your great answer!

Just to make sure I understand this correctly. When I use bcl2fastq I get 16 fastq files (L001-L004 and R1-R4). Did you merge your files from L001 to L004 into a single L001 file for each read such that you can call ~/dropEst/droptag as you mentioned above?

Thanks much! I appreciate your help!

On Tue, Oct 1, 2019 at 4:16 PM chlee-tabin [email protected] wrote:

Hi, I am not the developer but I have been working with dropEst with inDrop V3 dataset, and I think for dropTag, there is in dropEst/configs/indrop_v3_with_barcodes.xml that could be used.

Simply, use the same bcl2fastq output, but then run dropTag, and then map it, then run dropEst as usual (although for velocyto for my particular issue, it has been suggested to consider the directionality to deal with nested genes).

For what is worth, I pasted the script I use to process my data via dropEst for velocyto package. One could run individually via substituting SLURM_ARRAY_TASK_ID to the appropriate library number.

libraries :

- {library_name: "lib21_R1", library_index: "GAGACGGA"}

- {library_name: "lib22_K2", library_index: "AGAAAGCT"}

- {library_name: "lib23_R2", library_index: "ACGCTCTT"}

- {library_name: "lib24_K1", library_index: "CGCATTCT"}

declare -A library_tag=( ["21"]="GAGACGGA" ["22"]="AGAAAGCT" ["23"]="ACGCTCTT" ["24"]="CGCATTCT" )

module load gcc/6.2.0 module load boost/1.62.0 module load bamtools/2.4.1 module load R/3.4.1

~/dropEst/droptag -c ~/dropEst/configs/indrop_v3_with_barcodes.xml -S -s -l dlib${SLURM_ARRAY_TASK_ID}_FC04733 -n drop/dlib${SLURM_ARRAY_TASK_ID}.FC04733.L001 -p 8 -t ${library_tag[${SLURM_ARRAY_TASK_ID}]}
../Data/Intensities/BaseCalls/Undetermined_S0_L001_R2_001.fastq.gz
../Data/Intensities/BaseCalls/Undetermined_S0_L001_R4_001.fastq.gz
../Data/Intensities/BaseCalls/Undetermined_S0_L001_R1_001.fastq.gz
../Data/Intensities/BaseCalls/Undetermined_S0_L001_R3_001.fastq.gz

Then mapping:

ls drop/dlib${SLURM_ARRAY_TASK_ID}*.fastq.gz | paste -sd "," - | gawk '{ print "STAR --genomeDir drop.reference.wo.pseudo --runThreadN 4 --readFilesIn " $0 " --outFileNamePrefix drop/dlib${SLURM_ARRAY_TASK_ID}.out/ --readFilesCommand zcat --outSAMmultNmax 1 --outSAMunmapped Within --outSAMtype BAM SortedByCoordinate"}' | bash

Then running the dropEst to obtain the matrices:

~/dropEst/dropest -m -V
-r "${params_file}"
-L eiEIBA
-g drop.reference.wo.pseudo/mm10.with.compact.transgenes.filtered.20180315.gtf
-o drop/dlib${SLURM_ARRAY_TASK_ID}.out/cell.counts.bcconfig.rds
-c ~/dropEst/configs/indrop_v3_with_barcodes.xml
drop/dlib${SLURM_ARRAY_TASK_ID}.out/Aligned.sortedByCoord.out.bam

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/hms-dbmi/dropEst/issues/63?email_source=notifications&email_token=ACNJIR6MQ5HHYLTRXWGJS2LQMOVYDA5CNFSM4GKNK4XKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEACSZLQ#issuecomment-537210030, or mute the thread https://github.com/notifications/unsubscribe-auth/ACNJIR3BWCWBY3NBB533HCLQMOVYDANCNFSM4GKNK4XA .

-- Edroaldo

edroaldo avatar Oct 01 '19 20:10 edroaldo

@edroaldo Honestly, I trimmed the script for clarity, but essentially I copy pasted three times (not 15 times) to have the same droptag command with L001 to L002 etc... R1-R4 contain different information and above one liner deals with those four for a lane. One could may be make a more creative bash script to do it one liner but it worked for me so far.

(BTW, The commented out library tag lists are from the .yaml file from indrops pipeline)

chlee-tabin avatar Oct 01 '19 20:10 chlee-tabin

Got it! Thank you so much!

On Tue, Oct 1, 2019 at 4:49 PM chlee-tabin [email protected] wrote:

@edroaldo https://github.com/edroaldo Honestly, I trimmed the script for clarity, but essentially I copy pasted three times (not 15 times) to have the same droptag command with L001 to L002 etc... R1-R4 contain different information and above one liner deals with those four for a lane. One could may be make a more creative bash script to do it one liner but it worked for me so far.

(BTW, The commented out library tag lists are from the .yaml file from indrops pipeline)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/hms-dbmi/dropEst/issues/63?email_source=notifications&email_token=ACNJIR6XNS4RW57QDMQAV6DQMOZT3A5CNFSM4GKNK4XKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEACWF7A#issuecomment-537223932, or mute the thread https://github.com/notifications/unsubscribe-auth/ACNJIRZUX2XGAOHQET3TDKDQMOZT3ANCNFSM4GKNK4XA .

-- Edroaldo

edroaldo avatar Oct 02 '19 13:10 edroaldo