zUMIs icon indicating copy to clipboard operation
zUMIs copied to clipboard

Question about put pattern and internal reads together to analyse

Open Dkaaaaa opened this issue 1 year ago • 3 comments

Hi, I am confused about the result and use of zUMIs pipeline. Here are my yaml content. pm1-1.yaml.txt Firts, my input was the paired-end reads with start of specify pattern sequence. The reads1 contain total 1657623 reads, while the STAR result : filtered.tagged.Log.final.out shows that Number of input reads are 846533, Q1 I think that was because of the filterring condition and cDNA range setted in reads1 in my yaml file. Am I right? image

Secondly, I have found that reads id in "pm1.filtered.tagged.unmapped.bam <flag including: 4>", "pm1.filtered.tagged.Aligned.out.bam<flag including: 0, 16>" and "pm1.filtered.Aligned.GeneTagged.sorted.bam<flag including: 0, 16>" are same. Q2 And why "pm1.filtered.tagged.unmapped.bam reads id are the same as pm1.filtered.tagged.Aligned.out.bam and pm1.filtered.Aligned.GeneTagged.sorted.bam. What's more, "pm1.filtered.tagged.Aligned.toTranscriptome.out.bam<flag including: 0, 16, 252, 276>" has missed some reads according to above three bam files, below is the miss reads in "pm1.filtered.tagged.Aligned.toTranscriptome.out.bam" in the pm1.filtered.Aligned.GeneTagged.sorted.bam file. miss-in-toTranscriptome.bam.txt, I also check the first read in miss reads bam result mapping position, below is the ENSG0000014267 position of transcriptome of my reference, and is no problem. Snipaste_2023-12-04_13-23-32 Q3 Why these reads miss in pm1.filtered.tagged.Aligned.toTranscriptome.out.bam?

Finally, I have separate my raw reads<PE150> into paired patterned_reads and paired internal_reads. And I think you should know that my data was silmilar to smart-seq3, but my data was based on 3' polyA to obtain the mRNA. pm1-1.yaml above was input with patterned_reads, and the reads1 was set for BC and UMI only, the reads2 are set for cDNA. Now, I wanna put my internal reads together to analyze, below is my new yaml content<pm1-2.yaml.txt>. I set the paired internal reads as file3 and file4, with cDNA range: 1-150. When I run with this yaml file, there are some erro below. _Q4 How should I do to put my patterned_reads and internal reads together to analyze? image image below is my yaml file. pm1-2.yaml.txt below is my new STAR filtered.tagged.Log.final.out shows that Number of input reads are 846533, it seem the file3 and file4 are fail to put together to analyze. While the Uniquely mapped reads number are less than not put together to analyze. image

I am so puzzled about above, looking forward to your reply, thanks a lot! Dka

Dkaaaaa avatar Dec 04 '23 06:12 Dkaaaaa

Hi,

as mentioned in your other issue, the use of the particular 11bp pattern "ATTGCGCAATG" is reserved to the processing of Smart-seq3 data. our pipeline is hardcoded in this case and I am unfortunately unable to provide support to custom protocols that you might be trying to process. Sorry about this,

Christoph

cziegenhain avatar Dec 04 '23 07:12 cziegenhain

I am still puzzle about your answer. Below is the smartseq3 yaml. image

What are the file3 and file4 function for this pipline? What if I do not separate my data into patterns reads and internal reads, and than just setup the file1 and file2 like this: file1: name: /home/ccy/1-scrna-data-2023-11-14/rawdata/star-test-1/patterns_and_internal_1.fq.gz base_definition: - BC(12-17,33-40,56-63) - UMI(64-69) find_pattern: ATTGCGCAATG file2: name: /home/ccy/1-scrna-data-2023-11-14/rawdata/star-test-1/patterns_and_internal_2.fq.gz base_definition: - cDNA(1-150) What happen to those do not start with pattern's reads in file1, will they use to mapping? or will be drop?

Dkaaaaa avatar Dec 04 '23 07:12 Dkaaaaa

@Dkaaaaa zUMIs will filter some low quality reads according to barcode and UMIs before go to STAR and i think that's why the number of input reads is less than in reads1 file.

bioinfotec avatar Jan 09 '24 17:01 bioinfotec