methylseq
methylseq copied to clipboard
Latest template merge completely broke dev branch (on AWS + Fusion)
Description of the bug
I am trying to launch a methylseq run using the latest dev branch where some (but not a all) samples require merging of technical replicates before launching. If I understand it correctly, the latest template changes were merged into dev earlier this month, but something seems to have gone awry:
Within seconds of launching the run, I observe the following errors:
- The samples are not getting merged, despite technical replicates having identical IDs (which no longer get truncated by 1 element, which is good!)
- Trim Galore fails straight away as the system tries to create the same symbolic link several times (details below)
- As one of the first processes,
bismark2summaryis run, and obviously fails...
Obviously, the ln -s command attempts to use the very same filename 6 times over, which doesn't work. But something also screwed up the entire workflow logic, i.e. not starting with merging, and instead running post-run QC right at the start.
Here is an example samplesheet:
sample,fastq_1,fastq_2,genome
GSM7506206_P3_plus_12_32F_Smith_C_Klf4,s3://filebucket/SRR24994983_GSM7506206_P3_plus_12_32F_Smith_C_Klf4_Homo_sapiens_Bisulfite-Seq_R1.fastq.gz,,
GSM7506206_P3_plus_12_32F_Smith_C_Klf4,s3://filebucket/SRR24994984_GSM7506206_P3_plus_12_32F_Smith_C_Klf4_Homo_sapiens_Bisulfite-Seq_R1.fastq.gz,,
GSM7506206_P3_plus_12_32F_Smith_C_Klf4,s3://filebucket/SRR24994985_GSM7506206_P3_plus_12_32F_Smith_C_Klf4_Homo_sapiens_Bisulfite-Seq_R1.fastq.gz,,
GSM7506206_P3_plus_12_32F_Smith_C_Klf4,s3://filebucket/SRR24994986_GSM7506206_P3_plus_12_32F_Smith_C_Klf4_Homo_sapiens_Bisulfite-Seq_R1.fastq.gz,,
GSM7506206_P3_plus_12_32F_Smith_C_Klf4,s3://filebucket/SRR24994987_GSM7506206_P3_plus_12_32F_Smith_C_Klf4_Homo_sapiens_Bisulfite-Seq_R1.fastq.gz,,
GSM7506206_P3_plus_12_32F_Smith_C_Klf4,s3://filebucket/SRR24994988_GSM7506206_P3_plus_12_32F_Smith_C_Klf4_Homo_sapiens_Bisulfite-Seq_R1.fastq.gz,,
GSM7431885_NB18_32F_TNTtoKSR_553_rep1,s3://filebucket/SRR24757836_GSM7431885_NB18_32F_TNTtoKSR_553_rep1_Homo_sapiens_Bisulfite-Seq_R1.fastq.gz,s3://filebucket/SRR24757836_GSM7431885_NB18_32F_TNTtoKSR_553_rep1_Homo_sapiens_Bisulfite-Seq_R2.fastq.gz,
GSM7431885_NB18_32F_TNTtoKSR_553_rep1,s3://filebucket/SRR24757837_GSM7431885_NB18_32F_TNTtoKSR_553_rep1_Homo_sapiens_Bisulfite-Seq_R1.fastq.gz,s3://filebucket/SRR24757837_GSM7431885_NB18_32F_TNTtoKSR_553_rep1_Homo_sapiens_Bisulfite-Seq_R2.fastq.gz,
Command used and terminal output
This is the command it attempts to run:
Command
[ ! -f GSM7506206_P3_plus_12_32F_Smith_C_Klf4.fastq.gz ] && ln -s SRR24994983_GSM7506206_P3_plus_12_32F_Smith_C_Klf4_Homo_sapiens_Bisulfite-Seq_R1.fastq.gz SRR24994984_GSM7506206_P3_plus_12_32F_Smith_C_Klf4_Homo_sapiens_Bisulfite-Seq_R1.fastq.gz SRR24994985_GSM7506206_P3_plus_12_32F_Smith_C_Klf4_Homo_sapiens_Bisulfite-Seq_R1.fastq.gz SRR24994986_GSM7506206_P3_plus_12_32F_Smith_C_Klf4_Homo_sapiens_Bisulfite-Seq_R1.fastq.gz SRR24994987_GSM7506206_P3_plus_12_32F_Smith_C_Klf4_Homo_sapiens_Bisulfite-Seq_R1.fastq.gz SRR24994988_GSM7506206_P3_plus_12_32F_Smith_C_Klf4_Homo_sapiens_Bisulfite-Seq_R1.fastq.gz GSM7506206_P3_plus_12_32F_Smith_C_Klf4.fastq.gz
trim_galore \
--fastqc \
--cores 8 \
--gzip \
GSM7506206_P3_plus_12_32F_Smith_C_Klf4.fastq.gz
Terminal output of Trim Galore process:
ln: GSM7506206_P3_plus_12_32F_Smith_C_Klf4.fastq.gz: File exists
ln: GSM7506206_P3_plus_12_32F_Smith_C_Klf4.fastq.gz: File exists
ln: GSM7506206_P3_plus_12_32F_Smith_C_Klf4.fastq.gz: File exists
ln: GSM7506206_P3_plus_12_32F_Smith_C_Klf4.fastq.gz: File exists
ln: GSM7506206_P3_plus_12_32F_Smith_C_Klf4.fastq.gz: File exists
11:11AM INF shutdown filesystem start
11:11AM INF shutdown filesystem done
Relevant files
No response
System information
I am running this on Seqera platform on AWS, using Fusion. Nextflow v23.10.1 build 5891. nf-core/methylseq version: dev
Have you tried simplifying the names to just GSM7431885 and GSM7506206
The sample name doesn't have to match the original input, you can name it something more descriptive than an ID as well.
Simplifying the name has no effect (other than a different file name...):
ln: GSM7506206.fastq.gz: File exists
ln: GSM7506206.fastq.gz: File exists
ln: GSM7506206.fastq.gz: File exists
ln: GSM7506206.fastq.gz: File exists
ln: GSM7506206.fastq.gz: File exists
But this command can never work:
ln -s SRR24994983_GSM7506206_P3_plus_12_32F_Smith_C_Klf4_Homo_sapiens_Bisulfite-Seq_R1.fastq.gz
SRR24994984_GSM7506206_P3_plus_12_32F_Smith_C_Klf4_Homo_sapiens_Bisulfite-Seq_R1.fastq.gz
SRR24994985_GSM7506206_P3_plus_12_32F_Smith_C_Klf4_Homo_sapiens_Bisulfite-Seq_R1.fastq.gz
SRR24994986_GSM7506206_P3_plus_12_32F_Smith_C_Klf4_Homo_sapiens_Bisulfite-Seq_R1.fastq.gz
SRR24994987_GSM7506206_P3_plus_12_32F_Smith_C_Klf4_Homo_sapiens_Bisulfite-Seq_R1.fastq.gz
SRR24994988_GSM7506206_P3_plus_12_32F_Smith_C_Klf4_Homo_sapiens_Bisulfite-Seq_R1.fastq.gz
GSM7506206.fastq.gz
using Fusion
I have a feeling this might be it, with the soft links for whatever weird reason.
Two new experiments:
- Can you run the methylseq test profile in the environment?
- Can you run the rnaseq test profile in the environment? (It has trimgalore)
- If the above two work, what about a rnaseq test full?
Also, any previous versions confirmed? Because the trimgalore module hasn't been updated in 11 months.
It also fails with 2.6.0:
ln: GSM7506206.fastq.gz: File exists
ln: GSM7506206.fastq.gz: File exists
ln: GSM7506206.fastq.gz: File exists
ln: GSM7506206.fastq.gz: File exists
ln: GSM7506206.fastq.gz: File exists
and the process doesn't start at all with 2.5.0 (as it expected filenames to contain at least one _ underscore back then:
Execution completed unsuccessfully!
The full error message was:
fromIndex = -1