Porechop
Porechop copied to clipboard
error running in porechop
Hello,
I am trying to use porechop for adaptor trimming on nanopore reads. I constantly get the following error:
Error: input_reads.fastq could not be parsed - is it formatted correctly?
I double checked the file. It is fine.
I am using the following command:
porechop -i input_reads.fastq -o output_reads.fastq
Am I missing anything?
TIA
I got the same error and was able to track it down using nanoplot (it provides a more detailed error message).
NanoPlot --fastq input.fastq.gz --loglength --outdir log_scaled
Traceback (most recent call last):
File "/home/lina/.local/bin/NanoPlot", line 11, in <module>
sys.exit(main())
File "/home/lina/.local/lib/python2.7/site-packages/nanoplot/NanoPlot.py", line 46, in main
datadf, lengthprefix, logBool, readlengthsPointer = getInput(args)
File "/home/lina/.local/lib/python2.7/site-packages/nanoplot/NanoPlot.py", line 148, in getInput
datadf = pd.concat([nanoget.processFastqPlain(inp) for inp in args.fastq], ignore_index=True)
File "/home/lina/.local/lib/python2.7/site-packages/nanoget/nanoget.py", line 243, in processFastqPlain
for record in SeqIO.parse(inputfastq, "fastq"):
File "/home/lina/.local/lib/python2.7/site-packages/Bio/SeqIO/__init__.py", line 611, in parse
for r in i:
File "/home/lina/.local/lib/python2.7/site-packages/Bio/SeqIO/QualityIO.py", line 1033, in FastqPhredIterator
for title_line, seq_string, quality_string in FastqGeneralIterator(handle):
File "/home/lina/.local/lib/python2.7/site-packages/Bio/SeqIO/QualityIO.py", line 954, in FastqGeneralIterator
% (title_line, seq_len, len(quality_string)))
ValueError: Lengths of sequence and quality values differs for ef273085-907d-49c2-a718-a0ee3a1b71eb runid=184b3ffc1e177f8e044bf254b791cc506e6483ae sampleid=input_sample read=690 ch=402 start_time=2018-04-07T01:46:09Z (4556 and 10912).
I am not sure if you have the same underlying error, but in my case, the sequence and quality lengths ended up differing.
EDIT:
here is the link to NanoPlot: https://github.com/wdecoster/NanoPlot
Ok, I just ran into this error again but this time, my data seems to be formatted well enough for nanoplot to run.
@rrwick can you think of other formatting issues that could cause this error message?
Looking into the code, nanoplot uses biopython's modules to read fastq, whereas porechop implements its own fastq reader.
I still don't know what was wonky about my fastq data but was able to use the following code to "sanitize" it so that porechop can read it:
# Inspired by: http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc282
import sys
from Bio import SeqIO
from Bio.SeqIO.QualityIO import FastqGeneralIterator
input_file = sys.argv[1]
output_file = sys.argv[2]
with open(input_file) as in_handle:
with open(output_file, "w") as out_handle:
for title, seq, qual in FastqGeneralIterator(in_handle):
out_handle.write("@%s\n%s\n+\n%s\n" % (title, seq, qual))