bowtie icon indicating copy to clipboard operation
bowtie copied to clipboard

Re-open issue: invalid fastq files produced using --un #8

Open elenichri opened this issue 5 years ago • 4 comments

Hello, I re-open this issue...I am mapping paired-end reads using bowtie2 and the --un option; therefore I retrieve two output fastq files, one for each paired-end read. I then use star aligner to map these fastq files to the human genome. Star stars running but I get the error ReadAlignChunk_processChunks.cpp:115:processChunks EXITING because of FATAL ERROR in input reads: unknown file format: the read ID should start with @ or >

I ran fastQValidator program to check if the fastq files that bowtie2 returns are valid.(https://genome.sph.umich.edu/wiki/FastQValidator) ./fastQValidator --file xxx.trimmed.2.fastq Here is the output:

ERROR on Line 10414: Invalid character ('J') in base sequence. ERROR on Line 10414: Invalid character ('J') in base sequence. ERROR on Line 10414: Invalid character ('J') in base sequence. ERROR on Line 10414: Invalid character ('J') in base sequence. ERROR on Line 10414: Invalid character ('J') in base sequence. ERROR on Line 10414: Invalid character ('J') in base sequence. ERROR on Line 10414: Invalid character ('J') in base sequence. ERROR on Line 10414: Invalid character ('J') in base sequence. ERROR on Line 10414: Invalid character ('J') in base sequence. ERROR on Line 10414: Invalid character ('J') in base sequence. ERROR on Line 10414: Invalid character ('J') in base sequence. ERROR on Line 10414: Invalid character ('J') in base sequence. ERROR on Line 10414: Invalid character ('J') in base sequence. ERROR on Line 10414: Invalid character ('J') in base sequence. ERROR on Line 10414: Invalid character ('J') in base sequence. ERROR on Line 10414: Invalid character ('J') in base sequence. ERROR on Line 10414: Invalid character ('J') in base sequence. ERROR on Line 10414: Invalid character ('J') in base sequence. ERROR on Line 10414: Invalid character ('J') in base sequence. ERROR on Line 10414: Invalid character ('J') in base sequence. Finished processing xxx.trimmed.2.fastq with 90418286 lines containing 22604486 sequences. There were a total of 12073 errors. Returning: 1 : FASTQ_INVALID

So, it seems that bowtie2 generates invalid fastq files in my case. Do you have any idea on how I can fix this problem? My inputs (var2 and var3) are trimmed fastq files but I wouldn't like to use the non-trimmed fastq files. I use 8 cores for running bowtie2 on 12 samples. My run command is bowtie2 --dovetail --no-discordant -I 20 -p 8 -x _my reference sequence_ --un-conc "$var1" -1 "$var2" -2 "$var3" -S "$var4" where var.i is taken from a parameters file

Thank you very much in advance! Eleni

elenichri avatar Nov 27 '18 08:11 elenichri

original issue: https://github.com/BenLangmead/bowtie/issues/8

mschilli87 avatar Nov 27 '18 10:11 mschilli87

How often does this happen? Every run, or sporadically? I am asking because I am trying to figure out whether this is a multi-threaded related issue or the wrapper script just not processing "trimmed" input correctly.

ch4rr0 avatar Nov 29 '18 15:11 ch4rr0

Dear ch4rr0, thank you for your reply. It happens for all the fastq files of one dataset with 12 samples. All 12 fastq files are invalid. I run my bowtie2 command in multithread (12 threads) but I don't think that this is an issue; the exact same command, using threads, works perfectly fine for another dataset. I am certain that the 'trimmed.fastq' input files are correct because I have also mapped them with star and I had no problem at all.

elenichri avatar Dec 01 '18 03:12 elenichri

I am looking into this one. I will update the thread if and when I am to recreate the issue.

ch4rr0 avatar Jun 01 '19 01:06 ch4rr0