atacseq icon indicating copy to clipboard operation
atacseq copied to clipboard

Wrong parameterization causing error

Open taylordm opened this issue 2 years ago • 1 comments

Hi, all. I'm on a Redhat machine, using podman profile.

Trying out the atacseq pipeline.

(editing my original)

I saw the error when I ran this again:

(start paste)

nextflow run nf-core/atacseq --input design.csv --genome GRCm38 -profile podman

(skipping many repetitive lines)

executor > local (35) [d5/536b56] process > CHECK_DESIGN (design.csv) [100%] 1 of 1 ✔ [e3/75b8c9] process > MAKE_TSS_BED (genes.bed) [100%] 1 of 1 ✔ [a2/bf29da] process > MAKE_GENOME_FILTER (genome.fa) [100%] 1 of 1 ✔ [c4/3a3ba6] process > FASTQC (A_R3_T1) [100%] 6 of 6 ✔ [9b/53d54d] process > TRIMGALORE (A_R3_T1) [100%] 6 of 6 ✔ [12/672054] process > BWA_MEM (A_R3_T1) [100%] 6 of 6 ✔ [83/1a4a2b] process > SORT_BAM (A_R3_T1) [100%] 6 of 6 ✔ [a2/3948d1] process > MERGED_LIB_BAM (A_R2) [100%] 6 of 6, failed: 6 ✘ [- ] process > MERGED_LIB_BAM_FILTER - [- ] process > MERGED_LIB_BAM_REMOVE_ORPHAN - [- ] process > MERGED_LIB_PRESEQ - [- ] process > MERGED_LIB_PICARD_METRICS - [- ] process > MERGED_LIB_BIGWIG - [- ] process > MERGED_LIB_PLOTPROFILE - [- ] process > MERGED_LIB_PLOTFINGERPRINT - [- ] process > MERGED_LIB_MACS2 - [- ] process > MERGED_LIB_MACS2_ANNOTATE - [- ] process > MERGED_LIB_MACS2_QC - [- ] process > MERGED_LIB_CONSENSUS - [- ] process > MERGED_LIB_CONSENSUS_ANNOTATE - [- ] process > MERGED_LIB_CONSENSUS_COUNTS - [- ] process > MERGED_LIB_CONSENSUS_DESEQ2 - [- ] process > MERGED_LIB_ATAQV - [- ] process > MERGED_LIB_ATAQV_MKARV - [- ] process > MERGED_REP_BAM - [- ] process > MERGED_REP_BIGWIG - [- ] process > MERGED_REP_MACS2 - [- ] process > MERGED_REP_MACS2_ANNOTATE - [- ] process > MERGED_REP_MACS2_QC - [- ] process > MERGED_REP_CONSENSUS - [- ] process > MERGED_REP_CONSENSUS_ANNOTATE - [- ] process > MERGED_REP_CONSENSUS_COUNTS - [- ] process > MERGED_REP_CONSENSUS_DESEQ2 - [- ] process > IGV - [17/871f5e] process > get_software_versions [100%] 1 of 1 ✔ [- ] process > MULTIQC - [f8/6399ce] process > output_documentation [100%] 1 of 1 ✔ Execution cancelled -- Finishing pending tasks before exit -[nf-core/atacseq] Pipeline completed with errors- WARN: To render the execution DAG in the required format it is required to install Graphviz -- See http://www.graphviz.org for more info. Error executing process > 'MERGED_LIB_BAM (B_R3)'

Caused by: Process MERGED_LIB_BAM (B_R3) terminated with an error exit status (1)

Command executed:

picard -Xmx36g MarkDuplicates
INPUT=B_R3_T1.Lb.sorted.bam
OUTPUT=B_R3.mLb.mkD.sorted.bam
ASSUME_SORTED=true
REMOVE_DUPLICATES=false
METRICS_FILE=B_R3.mLb.mkD.MarkDuplicates.metrics.txt
VALIDATION_STRINGENCY=LENIENT
TMP_DIR=tmp

samtools index B_R3.mLb.mkD.sorted.bam samtools idxstats B_R3.mLb.mkD.sorted.bam > B_R3.mLb.mkD.sorted.bam.idxstats samtools flagstat B_R3.mLb.mkD.sorted.bam > B_R3.mLb.mkD.sorted.bam.flagstat samtools stats B_R3.mLb.mkD.sorted.bam > B_R3.mLb.mkD.sorted.bam.stats

Command exit status: 1

Command output: (empty)

Command error: /opt/conda/envs/nf-core-atacseq-1.2.2/bin/picard: line 5: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8): No such file or directory INFO 2022-08-13 03:04:33 MarkDuplicates

********** NOTE: Picard's command line syntax is changing.


********** For more information, please see: ********** https://github.com/broadinstitute/picard/wiki/Command-Line-Syntax-Transition-For-Users-(Pre-Transition)


********** The command line looks like this in the new syntax:


********** MarkDuplicates -INPUT B_R3_T1.Lb.sorted.bam -OUTPUT B_R3.mLb.mkD.sorted.bam -ASSUME_SORTED true -REMOVE_DUPLICATES false -METRICS_FILE B_R3.mLb.mkD.MarkDuplicates.metrics.txt -VALIDATION_STRINGENCY LENIENT -TMP_DIR tmp


03:04:34.124 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/opt/conda/envs/nf-core-atacseq-1.2.2/share/picard-2.23.1-0/picard.jar!/com/intel/gkl/native/libgkl_compression.so [Sat Aug 13 03:04:34 UTC 2022] MarkDuplicates INPUT=[B_R3_T1.Lb.sorted.bam] OUTPUT=B_R3.mLb.mkD.sorted.bam METRICS_FILE=B_R3.mLb.mkD.MarkDuplicates.metrics.txt REMOVE_DUPLICATES=false ASSUME_SORTED=true TMP_DIR=[tmp] VALIDATION_STRINGENCY=LENIENT MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000 MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=8000 SORTING_COLLECTION_SIZE_RATIO=0.25 TAG_DUPLICATE_SET_MEMBERS=false REMOVE_SEQUENCING_DUPLICATES=false TAGGING_POLICY=DontTag CLEAR_DT=true DUPLEX_UMI=false ADD_PG_TAG_TO_READS=true DUPLICATE_SCORING_STRATEGY=SUM_OF_BASE_QUALITIES PROGRAM_RECORD_ID=MarkDuplicates PROGRAM_GROUP_NAME=MarkDuplicates READ_NAME_REGEX=<optimized capture of last three ':' separated fields as numeric values> OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 MAX_OPTICAL_DUPLICATE_SET_SIZE=300000 VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false [Sat Aug 13 03:04:34 UTC 2022] Executing as root@b31f1c4e6bab on Linux 4.18.0-372.16.1.el8_6.x86_64 amd64; OpenJDK 64-Bit Server VM 11.0.8-internal+0-adhoc..src; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.23.1 [Sat Aug 13 03:04:34 UTC 2022] picard.sam.markduplicates.MarkDuplicates done. Elapsed time: 0.00 minutes. Runtime.totalMemory()=2147483648 To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp Exception in thread "main" htsjdk.samtools.SAMException: Cannot write file: B_R3.mLb.mkD.sorted.bam. File does not exist and parent directory is not writable.. at htsjdk.samtools.util.IOUtil.assertFileIsWritable(IOUtil.java:562) at picard.sam.markduplicates.MarkDuplicates.doWork(MarkDuplicates.java:250) at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:305) at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103) at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:113)

Work dir: /data/200901_0686_BHFFWNDRX2/analysis/work/b6/b0a43de9b54aa171408f1711b135ce

Tip: you can replicate the issue by changing to the process work dir and entering the command bash .command.run


(end paste)

When I go into the work directory, I see file names:

B_R3_T1.Lb.sorted.bam B_R3_T1.Lb.sorted.bam.bai tmp

Running bash .command.run, shows that the program is looking for a different file name in that directory, should have been generated by MarkDuplicates:

(start paste)

bash .command.run /opt/conda/envs/nf-core-atacseq-1.2.2/bin/picard: line 5: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8): No such file or directory INFO 2022-08-13 20:25:02 MarkDuplicates

********** NOTE: Picard's command line syntax is changing.


********** For more information, please see: ********** https://github.com/broadinstitute/picard/wiki/Command-Line-Syntax-Transition-For-Users-(Pre-Transition)


********** The command line looks like this in the new syntax:


********** MarkDuplicates -INPUT B_R3_T1.Lb.sorted.bam -OUTPUT B_R3.mLb.mkD.sorted.bam -ASSUME_SORTED true -REMOVE_DUPLICATES false -METRICS_FILE B_R3.mLb.mkD.MarkDuplicates.metrics.txt -VALIDATION_STRINGENCY LENIENT -TMP_DIR tmp


20:25:02.758 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/opt/conda/envs/nf-core-atacseq-1.2.2/share/picard-2.23.1-0/picard.jar!/com/intel/gkl/native/libgkl_compression.so [Sat Aug 13 20:25:02 UTC 2022] MarkDuplicates INPUT=[B_R3_T1.Lb.sorted.bam] OUTPUT=B_R3.mLb.mkD.sorted.bam METRICS_FILE=B_R3.mLb.mkD.MarkDuplicates.metrics.txt REMOVE_DUPLICATES=false ASSUME_SORTED=true TMP_DIR=[tmp] VALIDATION_STRINGENCY=LENIENT MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000 MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=8000 SORTING_COLLECTION_SIZE_RATIO=0.25 TAG_DUPLICATE_SET_MEMBERS=false REMOVE_SEQUENCING_DUPLICATES=false TAGGING_POLICY=DontTag CLEAR_DT=true DUPLEX_UMI=false ADD_PG_TAG_TO_READS=true DUPLICATE_SCORING_STRATEGY=SUM_OF_BASE_QUALITIES PROGRAM_RECORD_ID=MarkDuplicates PROGRAM_GROUP_NAME=MarkDuplicates READ_NAME_REGEX=<optimized capture of last three ':' separated fields as numeric values> OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 MAX_OPTICAL_DUPLICATE_SET_SIZE=300000 VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false [Sat Aug 13 20:25:02 UTC 2022] Executing as root@205ed8db6e92 on Linux 4.18.0-372.16.1.el8_6.x86_64 amd64; OpenJDK 64-Bit Server VM 11.0.8-internal+0-adhoc..src; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.23.1 [Sat Aug 13 20:25:02 UTC 2022] picard.sam.markduplicates.MarkDuplicates done. Elapsed time: 0.00 minutes. Runtime.totalMemory()=2147483648 To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp Exception in thread "main" htsjdk.samtools.SAMException: Cannot write file: /data/A00901_0686_BHFFWNDRX2/analysis/work/b6/b0a43de9b54aa171408f1711b135ce/B_R3.mLb.mkD.sorted.bam. File does not exist and parent directory is not writable.. at htsjdk.samtools.util.IOUtil.assertFileIsWritable(IOUtil.java:562) at picard.sam.markduplicates.MarkDuplicates.doWork(MarkDuplicates.java:250) at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:305) at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103) at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:113)

(end paste)

You can see that the file names it's looking for:

B_R3.mLb.mkD.sorted.bam for example is not in my work directory. I think MarkDuplicates is not being run?

Any idea what I have to do here?

Thank you in advance!

taylordm avatar Aug 12 '22 18:08 taylordm

So one of our Unix people talked to the Redhat folks and after a lot of testing, they strongly suspect the issue is with the version of Java that is shipping with the atac-seq pipeline container. The version of Java shipped with atacseq is 1.11. There are multiple bugs with older java versions related to IO.Files.iswritable and NAS devices. Because back here we are using multi-protocol fileshares, he can't do anything fancy with permissions to get around it. We can generate a proof of this if useful.

taylordm avatar Aug 18 '22 16:08 taylordm

Please use backticks to format code and pasted next time, makes it much easier to read 😃 (just edited for you). Like this:

Normal text
```
multi-line
Code block
```

More normal text, `in-line code` and done.

ewels avatar Oct 29 '22 09:10 ewels

@taylordm - have you tried running the dev version of the pipeline? It's been rewritten in DSL2 and uses different software packaging. Hopefully this will be resolved in the next release (coming soon, right @JoseEspinosa?)

ewels avatar Oct 29 '22 09:10 ewels

Yes, should be released before the end of this month. Actually, it will be nice to get feedback on any possible bug before the release, so if you try the dev branch and find any, please let us know @taylordm 😄

JoseEspinosa avatar Nov 03 '22 09:11 JoseEspinosa

Hi @taylordm ! Thanks for reporting! We are about to release a much updated version of the pipeline that has been completely refactored to be written in Nextflow DSL2. When this is released, it would be great if you can let us know if the problem still persists. I will close this issue for now.

For faster, real-time help for these sorts of things please join the #atacseq channel on the nf-core Slack workspace. You can join via the link below: https://nf-co.re/join

drpatelh avatar Nov 18 '22 12:11 drpatelh