TrimGalore
TrimGalore copied to clipboard
valX files vs trimmed files? diff output same code?
Hi,
I want to trim EM-SEQ fastq files. I used the same code, first for a single pair, and then for a batch. The code for the first pair was:
trim_galore --2colour 20 --illumina -o trim --paired V00001_R1.fastq.gz V00001_R2.fastq.gz
and the output was: V00001_R1_val_1.fq.gz V00001_R2_val_2.fq.gz
The summary stated trimming mode - paired end:
SUMMARISING RUN PARAMETERS
Input filename: V00001_R1.fastq.gz Trimming mode: paired-end Trim Galore version: 0.6.10 Cutadapt version: 1.18 Number of cores used for trimming: 1 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; user defined) 2-colour high quality G-trimming enabled, with quality cutoff: --nextseq-trim=20 Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp Output file will be GZIP compressed
Then, for a second pair I used the code:
trim_galore --2colour 20 --illumina --output_dir=trim -j 4 --paired V00021_R1.fastq.gz V00021_R2.fastq.gz
The output files were: V00021_R1_trimmed.fq.gz V00021_R2_trimmed.fq.gz
And the summary:
SUMMARISING RUN PARAMETERS
Input filename: V00021_R1.fastq.gz Trimming mode: paired-end Trim Galore version: 0.6.10 Cutadapt version: 1.18 Python version: could not detect Number of cores used for trimming: 4 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; user defined) 2-colour high quality G-trimming enabled, with quality cutoff: --nextseq-trim=20 Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp Output file will be GZIP compressed
Why the first pair had the prefix val* while the second just trimmed?
Is there something in the code that I didn't know or was it an effect of using multithreaded mode?
Thanks;
If you still have files called *trimmed.fq.gz
around in paired-end mode, it is likely that the run hasn't completely finished. Once the validation process is complete, both intermediate trimmed.fq.gz files will be deleted.
As a side note, if this trimming is for methylation alignments, I would recommend the trimming setting described here: http://felixkrueger.github.io/Bismark/bismark/library_types/#em-seq-neb
Hi @FelixKrueger
Related questions specific to EM-Seq:
- I assume one has to explicitly use
trim_galore
first on the R1/R2 files and then pass the trimmed R1/R2 files tobismark
- Based on your comment above, should I explicitly call out
--clip_R1 10 --clip_R2 10 --three_prime_clip_R1 10 --three_prime_clip_R2 10
when usingtrim_galore
or should I not - the legend below the table at https://felixkrueger.github.io/Bismark/bismark/library_types/ suggestsDefault settings (nothing in particular is required, just use Trim Galore or Bismark default parameters)
- If OK with you, would you know what would be the equivalent command with bbduk.sh - given that
bbduk
is java based, I would expect this step will be much faster
Thanks.
You don't necessarily have to use Trim Galore, but yes some trimming is recommended. the nf-core/methylseq pipeline has an EM-seq switch which should work equally:
I think this still uses Trim Galore under the hood
the nf-core/methylseq pipeline has an EM-seq switch which should work equally: