atacseq
atacseq copied to clipboard
Wrong parameterization causing error
Hi, all. I'm on a Redhat machine, using podman profile.
Trying out the atacseq pipeline.
(editing my original)
I saw the error when I ran this again:
(start paste)
nextflow run nf-core/atacseq --input design.csv --genome GRCm38 -profile podman
(skipping many repetitive lines)
executor > local (35) [d5/536b56] process > CHECK_DESIGN (design.csv) [100%] 1 of 1 ✔ [e3/75b8c9] process > MAKE_TSS_BED (genes.bed) [100%] 1 of 1 ✔ [a2/bf29da] process > MAKE_GENOME_FILTER (genome.fa) [100%] 1 of 1 ✔ [c4/3a3ba6] process > FASTQC (A_R3_T1) [100%] 6 of 6 ✔ [9b/53d54d] process > TRIMGALORE (A_R3_T1) [100%] 6 of 6 ✔ [12/672054] process > BWA_MEM (A_R3_T1) [100%] 6 of 6 ✔ [83/1a4a2b] process > SORT_BAM (A_R3_T1) [100%] 6 of 6 ✔ [a2/3948d1] process > MERGED_LIB_BAM (A_R2) [100%] 6 of 6, failed: 6 ✘ [- ] process > MERGED_LIB_BAM_FILTER - [- ] process > MERGED_LIB_BAM_REMOVE_ORPHAN - [- ] process > MERGED_LIB_PRESEQ - [- ] process > MERGED_LIB_PICARD_METRICS - [- ] process > MERGED_LIB_BIGWIG - [- ] process > MERGED_LIB_PLOTPROFILE - [- ] process > MERGED_LIB_PLOTFINGERPRINT - [- ] process > MERGED_LIB_MACS2 - [- ] process > MERGED_LIB_MACS2_ANNOTATE - [- ] process > MERGED_LIB_MACS2_QC - [- ] process > MERGED_LIB_CONSENSUS - [- ] process > MERGED_LIB_CONSENSUS_ANNOTATE - [- ] process > MERGED_LIB_CONSENSUS_COUNTS - [- ] process > MERGED_LIB_CONSENSUS_DESEQ2 - [- ] process > MERGED_LIB_ATAQV - [- ] process > MERGED_LIB_ATAQV_MKARV - [- ] process > MERGED_REP_BAM - [- ] process > MERGED_REP_BIGWIG - [- ] process > MERGED_REP_MACS2 - [- ] process > MERGED_REP_MACS2_ANNOTATE - [- ] process > MERGED_REP_MACS2_QC - [- ] process > MERGED_REP_CONSENSUS - [- ] process > MERGED_REP_CONSENSUS_ANNOTATE - [- ] process > MERGED_REP_CONSENSUS_COUNTS - [- ] process > MERGED_REP_CONSENSUS_DESEQ2 - [- ] process > IGV - [17/871f5e] process > get_software_versions [100%] 1 of 1 ✔ [- ] process > MULTIQC - [f8/6399ce] process > output_documentation [100%] 1 of 1 ✔ Execution cancelled -- Finishing pending tasks before exit -[nf-core/atacseq] Pipeline completed with errors- WARN: To render the execution DAG in the required format it is required to install Graphviz -- See http://www.graphviz.org for more info. Error executing process > 'MERGED_LIB_BAM (B_R3)'
Caused by:
Process MERGED_LIB_BAM (B_R3)
terminated with an error exit status (1)
Command executed:
picard -Xmx36g MarkDuplicates
INPUT=B_R3_T1.Lb.sorted.bam
OUTPUT=B_R3.mLb.mkD.sorted.bam
ASSUME_SORTED=true
REMOVE_DUPLICATES=false
METRICS_FILE=B_R3.mLb.mkD.MarkDuplicates.metrics.txt
VALIDATION_STRINGENCY=LENIENT
TMP_DIR=tmp
samtools index B_R3.mLb.mkD.sorted.bam samtools idxstats B_R3.mLb.mkD.sorted.bam > B_R3.mLb.mkD.sorted.bam.idxstats samtools flagstat B_R3.mLb.mkD.sorted.bam > B_R3.mLb.mkD.sorted.bam.flagstat samtools stats B_R3.mLb.mkD.sorted.bam > B_R3.mLb.mkD.sorted.bam.stats
Command exit status: 1
Command output: (empty)
Command error: /opt/conda/envs/nf-core-atacseq-1.2.2/bin/picard: line 5: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8): No such file or directory INFO 2022-08-13 03:04:33 MarkDuplicates
********** NOTE: Picard's command line syntax is changing.
********** For more information, please see: ********** https://github.com/broadinstitute/picard/wiki/Command-Line-Syntax-Transition-For-Users-(Pre-Transition)
********** The command line looks like this in the new syntax:
********** MarkDuplicates -INPUT B_R3_T1.Lb.sorted.bam -OUTPUT B_R3.mLb.mkD.sorted.bam -ASSUME_SORTED true -REMOVE_DUPLICATES false -METRICS_FILE B_R3.mLb.mkD.MarkDuplicates.metrics.txt -VALIDATION_STRINGENCY LENIENT -TMP_DIR tmp
03:04:34.124 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/opt/conda/envs/nf-core-atacseq-1.2.2/share/picard-2.23.1-0/picard.jar!/com/intel/gkl/native/libgkl_compression.so [Sat Aug 13 03:04:34 UTC 2022] MarkDuplicates INPUT=[B_R3_T1.Lb.sorted.bam] OUTPUT=B_R3.mLb.mkD.sorted.bam METRICS_FILE=B_R3.mLb.mkD.MarkDuplicates.metrics.txt REMOVE_DUPLICATES=false ASSUME_SORTED=true TMP_DIR=[tmp] VALIDATION_STRINGENCY=LENIENT MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000 MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=8000 SORTING_COLLECTION_SIZE_RATIO=0.25 TAG_DUPLICATE_SET_MEMBERS=false REMOVE_SEQUENCING_DUPLICATES=false TAGGING_POLICY=DontTag CLEAR_DT=true DUPLEX_UMI=false ADD_PG_TAG_TO_READS=true DUPLICATE_SCORING_STRATEGY=SUM_OF_BASE_QUALITIES PROGRAM_RECORD_ID=MarkDuplicates PROGRAM_GROUP_NAME=MarkDuplicates READ_NAME_REGEX=<optimized capture of last three ':' separated fields as numeric values> OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 MAX_OPTICAL_DUPLICATE_SET_SIZE=300000 VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false [Sat Aug 13 03:04:34 UTC 2022] Executing as root@b31f1c4e6bab on Linux 4.18.0-372.16.1.el8_6.x86_64 amd64; OpenJDK 64-Bit Server VM 11.0.8-internal+0-adhoc..src; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.23.1 [Sat Aug 13 03:04:34 UTC 2022] picard.sam.markduplicates.MarkDuplicates done. Elapsed time: 0.00 minutes. Runtime.totalMemory()=2147483648 To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp Exception in thread "main" htsjdk.samtools.SAMException: Cannot write file: B_R3.mLb.mkD.sorted.bam. File does not exist and parent directory is not writable.. at htsjdk.samtools.util.IOUtil.assertFileIsWritable(IOUtil.java:562) at picard.sam.markduplicates.MarkDuplicates.doWork(MarkDuplicates.java:250) at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:305) at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103) at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:113)
Work dir: /data/200901_0686_BHFFWNDRX2/analysis/work/b6/b0a43de9b54aa171408f1711b135ce
Tip: you can replicate the issue by changing to the process work dir and entering the command bash .command.run
(end paste)
When I go into the work directory, I see file names:
B_R3_T1.Lb.sorted.bam B_R3_T1.Lb.sorted.bam.bai tmp
Running bash .command.run, shows that the program is looking for a different file name in that directory, should have been generated by MarkDuplicates:
(start paste)
bash .command.run /opt/conda/envs/nf-core-atacseq-1.2.2/bin/picard: line 5: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8): No such file or directory INFO 2022-08-13 20:25:02 MarkDuplicates
********** NOTE: Picard's command line syntax is changing.
********** For more information, please see: ********** https://github.com/broadinstitute/picard/wiki/Command-Line-Syntax-Transition-For-Users-(Pre-Transition)
********** The command line looks like this in the new syntax:
********** MarkDuplicates -INPUT B_R3_T1.Lb.sorted.bam -OUTPUT B_R3.mLb.mkD.sorted.bam -ASSUME_SORTED true -REMOVE_DUPLICATES false -METRICS_FILE B_R3.mLb.mkD.MarkDuplicates.metrics.txt -VALIDATION_STRINGENCY LENIENT -TMP_DIR tmp
20:25:02.758 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/opt/conda/envs/nf-core-atacseq-1.2.2/share/picard-2.23.1-0/picard.jar!/com/intel/gkl/native/libgkl_compression.so [Sat Aug 13 20:25:02 UTC 2022] MarkDuplicates INPUT=[B_R3_T1.Lb.sorted.bam] OUTPUT=B_R3.mLb.mkD.sorted.bam METRICS_FILE=B_R3.mLb.mkD.MarkDuplicates.metrics.txt REMOVE_DUPLICATES=false ASSUME_SORTED=true TMP_DIR=[tmp] VALIDATION_STRINGENCY=LENIENT MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000 MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=8000 SORTING_COLLECTION_SIZE_RATIO=0.25 TAG_DUPLICATE_SET_MEMBERS=false REMOVE_SEQUENCING_DUPLICATES=false TAGGING_POLICY=DontTag CLEAR_DT=true DUPLEX_UMI=false ADD_PG_TAG_TO_READS=true DUPLICATE_SCORING_STRATEGY=SUM_OF_BASE_QUALITIES PROGRAM_RECORD_ID=MarkDuplicates PROGRAM_GROUP_NAME=MarkDuplicates READ_NAME_REGEX=<optimized capture of last three ':' separated fields as numeric values> OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 MAX_OPTICAL_DUPLICATE_SET_SIZE=300000 VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false [Sat Aug 13 20:25:02 UTC 2022] Executing as root@205ed8db6e92 on Linux 4.18.0-372.16.1.el8_6.x86_64 amd64; OpenJDK 64-Bit Server VM 11.0.8-internal+0-adhoc..src; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.23.1 [Sat Aug 13 20:25:02 UTC 2022] picard.sam.markduplicates.MarkDuplicates done. Elapsed time: 0.00 minutes. Runtime.totalMemory()=2147483648 To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp Exception in thread "main" htsjdk.samtools.SAMException: Cannot write file: /data/A00901_0686_BHFFWNDRX2/analysis/work/b6/b0a43de9b54aa171408f1711b135ce/B_R3.mLb.mkD.sorted.bam. File does not exist and parent directory is not writable.. at htsjdk.samtools.util.IOUtil.assertFileIsWritable(IOUtil.java:562) at picard.sam.markduplicates.MarkDuplicates.doWork(MarkDuplicates.java:250) at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:305) at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103) at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:113)
(end paste)
You can see that the file names it's looking for:
B_R3.mLb.mkD.sorted.bam for example is not in my work directory. I think MarkDuplicates is not being run?
Any idea what I have to do here?
Thank you in advance!
So one of our Unix people talked to the Redhat folks and after a lot of testing, they strongly suspect the issue is with the version of Java that is shipping with the atac-seq pipeline container. The version of Java shipped with atacseq is 1.11. There are multiple bugs with older java versions related to IO.Files.iswritable and NAS devices. Because back here we are using multi-protocol fileshares, he can't do anything fancy with permissions to get around it. We can generate a proof of this if useful.
Please use backticks to format code and pasted next time, makes it much easier to read 😃 (just edited for you). Like this:
Normal text
```
multi-line
Code block
```
More normal text, `in-line code` and done.
@taylordm - have you tried running the dev
version of the pipeline? It's been rewritten in DSL2 and uses different software packaging. Hopefully this will be resolved in the next release (coming soon, right @JoseEspinosa?)
Yes, should be released before the end of this month. Actually, it will be nice to get feedback on any possible bug before the release, so if you try the dev branch and find any, please let us know @taylordm 😄
Hi @taylordm ! Thanks for reporting! We are about to release a much updated version of the pipeline that has been completely refactored to be written in Nextflow DSL2. When this is released, it would be great if you can let us know if the problem still persists. I will close this issue for now.
For faster, real-time help for these sorts of things please join the #atacseq channel on the nf-core Slack workspace. You can join via the link below: https://nf-co.re/join