methylseq icon indicating copy to clipboard operation
methylseq copied to clipboard

Latest template merge completely broke dev branch (on AWS + Fusion)

Open FelixKrueger opened this issue 1 year ago • 4 comments

Description of the bug

I am trying to launch a methylseq run using the latest dev branch where some (but not a all) samples require merging of technical replicates before launching. If I understand it correctly, the latest template changes were merged into dev earlier this month, but something seems to have gone awry:

Within seconds of launching the run, I observe the following errors:

  1. The samples are not getting merged, despite technical replicates having identical IDs (which no longer get truncated by 1 element, which is good!)
  2. Trim Galore fails straight away as the system tries to create the same symbolic link several times (details below)
  3. As one of the first processes, bismark2summary is run, and obviously fails... Screenshot 2024-03-27 at 11 15 21

Obviously, the ln -s command attempts to use the very same filename 6 times over, which doesn't work. But something also screwed up the entire workflow logic, i.e. not starting with merging, and instead running post-run QC right at the start.

Here is an example samplesheet:

sample,fastq_1,fastq_2,genome
GSM7506206_P3_plus_12_32F_Smith_C_Klf4,s3://filebucket/SRR24994983_GSM7506206_P3_plus_12_32F_Smith_C_Klf4_Homo_sapiens_Bisulfite-Seq_R1.fastq.gz,,
GSM7506206_P3_plus_12_32F_Smith_C_Klf4,s3://filebucket/SRR24994984_GSM7506206_P3_plus_12_32F_Smith_C_Klf4_Homo_sapiens_Bisulfite-Seq_R1.fastq.gz,,
GSM7506206_P3_plus_12_32F_Smith_C_Klf4,s3://filebucket/SRR24994985_GSM7506206_P3_plus_12_32F_Smith_C_Klf4_Homo_sapiens_Bisulfite-Seq_R1.fastq.gz,,
GSM7506206_P3_plus_12_32F_Smith_C_Klf4,s3://filebucket/SRR24994986_GSM7506206_P3_plus_12_32F_Smith_C_Klf4_Homo_sapiens_Bisulfite-Seq_R1.fastq.gz,,
GSM7506206_P3_plus_12_32F_Smith_C_Klf4,s3://filebucket/SRR24994987_GSM7506206_P3_plus_12_32F_Smith_C_Klf4_Homo_sapiens_Bisulfite-Seq_R1.fastq.gz,,
GSM7506206_P3_plus_12_32F_Smith_C_Klf4,s3://filebucket/SRR24994988_GSM7506206_P3_plus_12_32F_Smith_C_Klf4_Homo_sapiens_Bisulfite-Seq_R1.fastq.gz,,
GSM7431885_NB18_32F_TNTtoKSR_553_rep1,s3://filebucket/SRR24757836_GSM7431885_NB18_32F_TNTtoKSR_553_rep1_Homo_sapiens_Bisulfite-Seq_R1.fastq.gz,s3://filebucket/SRR24757836_GSM7431885_NB18_32F_TNTtoKSR_553_rep1_Homo_sapiens_Bisulfite-Seq_R2.fastq.gz,
GSM7431885_NB18_32F_TNTtoKSR_553_rep1,s3://filebucket/SRR24757837_GSM7431885_NB18_32F_TNTtoKSR_553_rep1_Homo_sapiens_Bisulfite-Seq_R1.fastq.gz,s3://filebucket/SRR24757837_GSM7431885_NB18_32F_TNTtoKSR_553_rep1_Homo_sapiens_Bisulfite-Seq_R2.fastq.gz,

Command used and terminal output

This is the command it attempts to run:

Command

[ ! -f  GSM7506206_P3_plus_12_32F_Smith_C_Klf4.fastq.gz ] && ln -s SRR24994983_GSM7506206_P3_plus_12_32F_Smith_C_Klf4_Homo_sapiens_Bisulfite-Seq_R1.fastq.gz SRR24994984_GSM7506206_P3_plus_12_32F_Smith_C_Klf4_Homo_sapiens_Bisulfite-Seq_R1.fastq.gz SRR24994985_GSM7506206_P3_plus_12_32F_Smith_C_Klf4_Homo_sapiens_Bisulfite-Seq_R1.fastq.gz SRR24994986_GSM7506206_P3_plus_12_32F_Smith_C_Klf4_Homo_sapiens_Bisulfite-Seq_R1.fastq.gz SRR24994987_GSM7506206_P3_plus_12_32F_Smith_C_Klf4_Homo_sapiens_Bisulfite-Seq_R1.fastq.gz SRR24994988_GSM7506206_P3_plus_12_32F_Smith_C_Klf4_Homo_sapiens_Bisulfite-Seq_R1.fastq.gz GSM7506206_P3_plus_12_32F_Smith_C_Klf4.fastq.gz
trim_galore \
    --fastqc \
    --cores 8 \
    --gzip \
    GSM7506206_P3_plus_12_32F_Smith_C_Klf4.fastq.gz


Terminal output of Trim Galore process:
ln: GSM7506206_P3_plus_12_32F_Smith_C_Klf4.fastq.gz: File exists
ln: GSM7506206_P3_plus_12_32F_Smith_C_Klf4.fastq.gz: File exists
ln: GSM7506206_P3_plus_12_32F_Smith_C_Klf4.fastq.gz: File exists
ln: GSM7506206_P3_plus_12_32F_Smith_C_Klf4.fastq.gz: File exists
ln: GSM7506206_P3_plus_12_32F_Smith_C_Klf4.fastq.gz: File exists
11:11AM INF shutdown filesystem start
11:11AM INF shutdown filesystem done

Relevant files

No response

System information

Screenshot 2024-03-27 at 11 06 51

I am running this on Seqera platform on AWS, using Fusion. Nextflow v23.10.1 build 5891. nf-core/methylseq version: dev

FelixKrueger avatar Mar 27 '24 11:03 FelixKrueger

Have you tried simplifying the names to just GSM7431885 and GSM7506206

The sample name doesn't have to match the original input, you can name it something more descriptive than an ID as well.

edmundmiller avatar Mar 27 '24 14:03 edmundmiller

Simplifying the name has no effect (other than a different file name...): Screenshot 2024-03-28 at 11 02 08

ln: GSM7506206.fastq.gz: File exists
ln: GSM7506206.fastq.gz: File exists
ln: GSM7506206.fastq.gz: File exists
ln: GSM7506206.fastq.gz: File exists
ln: GSM7506206.fastq.gz: File exists

But this command can never work:

ln -s SRR24994983_GSM7506206_P3_plus_12_32F_Smith_C_Klf4_Homo_sapiens_Bisulfite-Seq_R1.fastq.gz
SRR24994984_GSM7506206_P3_plus_12_32F_Smith_C_Klf4_Homo_sapiens_Bisulfite-Seq_R1.fastq.gz
SRR24994985_GSM7506206_P3_plus_12_32F_Smith_C_Klf4_Homo_sapiens_Bisulfite-Seq_R1.fastq.gz
SRR24994986_GSM7506206_P3_plus_12_32F_Smith_C_Klf4_Homo_sapiens_Bisulfite-Seq_R1.fastq.gz
SRR24994987_GSM7506206_P3_plus_12_32F_Smith_C_Klf4_Homo_sapiens_Bisulfite-Seq_R1.fastq.gz
SRR24994988_GSM7506206_P3_plus_12_32F_Smith_C_Klf4_Homo_sapiens_Bisulfite-Seq_R1.fastq.gz
GSM7506206.fastq.gz

FelixKrueger avatar Mar 28 '24 11:03 FelixKrueger

using Fusion

I have a feeling this might be it, with the soft links for whatever weird reason.

Two new experiments:

  1. Can you run the methylseq test profile in the environment?
  2. Can you run the rnaseq test profile in the environment? (It has trimgalore)
  3. If the above two work, what about a rnaseq test full?

Also, any previous versions confirmed? Because the trimgalore module hasn't been updated in 11 months.

edmundmiller avatar Mar 28 '24 14:03 edmundmiller

It also fails with 2.6.0:

ln: GSM7506206.fastq.gz: File exists
ln: GSM7506206.fastq.gz: File exists
ln: GSM7506206.fastq.gz: File exists
ln: GSM7506206.fastq.gz: File exists
ln: GSM7506206.fastq.gz: File exists

and the process doesn't start at all with 2.5.0 (as it expected filenames to contain at least one _ underscore back then:

Execution completed unsuccessfully!

The full error message was:

fromIndex = -1

FelixKrueger avatar Mar 28 '24 15:03 FelixKrueger