KMC icon indicating copy to clipboard operation
KMC copied to clipboard

Error: some error while reading fastq file, please contact authors

Open vjanousk opened this issue 3 years ago • 0 comments

Hi, I am using kmc on fastq reads from sra archive and I hit into this issue. I found out that some reads in the fastq file are causing to stop kmc throwing the error above. Going read by read I was able to figure out that the issue is the third line which is filled with the same content as first line after the + sign (apparently after removing the content of the third line keeping only + sign solves the issue). Also, as you can see some reads have at the end of the first line "merged_XX_XX" which is result of running filtering by fastp and merging overlapping pair-end reads. It is the way how it's output from the fastp. However, most of the reads in the file is fine, there is only handfull of reads causing the issues. I wonder is that because kmc compares content of the first and third line and accepts it only if the two equals or the third one is except + empty?

Example of two reads that throws error: @SRR071593.3.3862.1 61H8HAAXX100305:5:1:1327:17846 length=101 merged_94_16 TTAGCAGGTAAATTGTTCACTAAAAACATTATGACTGGGTCCTTCACAGGAGTGCAGTTTACTTATTTTGCTGCAATCTAGGACAAATTAAGGGTTTTTTAAGTTAAATT +SRR071593.3.3862.1 61H8HAAXX100305:5:1:1327:17846 length=101 CCCCCCCCACCCCCCCCCCCCCCCCCCBBCCCCDCCBCCCCCCCCCCCBA2B?BC@CCCCCCCCCCCC@@CCACCCCCCCC;ACC?CCAC+CCA=?>><B?@CCACC@@> @SRR071593.3.11806.1 61H8HAAXX100305:5:1:1895:16220 length=101 merged_94_47 AGTGCCTTAGTTTTACATGGTTTTTTTATACAGAGACATTACATGTTTTTCCTTTCTGTTGTCTTCTTTTGTGTCAGTGTCTGTAACACAGCGATTTCCCCCTGGGATTAAAAAAGTATTCCAACTCTAATTTGTGCAAAT +SRR071593.3.11806.1 61H8HAAXX100305:5:1:1895:16220 length=101 CCCCCCCCCCCCCCCCCCCCCCCCCCBBBBCCCACBCCCCCCCCCCCCCBBCCCCCCCCCCCCCCCCCCB=CCCCCCCCBCCCCCCCCC@CCCCCCB>?B??)CCBCCCCCCCCC?BCC@?CCCC@A7<<8<+<@3ACCCC

Example of read that is fine: @SRR071593.3.11800.1 61H8HAAXX100305:5:1:1895:11926 length=101 ATTTTGGTCCATTTCCCCTTTTCCCCATTTATTCAATATATTTTTGGTACAAGATACTGCAGTCTTTGGTCTCATCATTCACACCATCTAGGCT +SRR071593.3.11800.1 61H8HAAXX100305:5:1:1895:11926 length=101 CCCCBBCCCCCDCCCCCCCCCCCCCC@CCCCCCCCCCCCCCCBCBBCBCCCCCCCCCBCCCCBCCCCACACACCCCCCCCCCCCCCCCCCCBAB

Thanks. Vaclav

vjanousk avatar Oct 07 '20 16:10 vjanousk