Issues running BBSPLIT process on DNAnexus
nf-core/rnaseq bug report
Description of the bug
nf-core/rnaseq -profile docker,test fails with permission errors when untarring the salmon index. Temp fix by @drpatelh in 0b4b140.
Subsiquent error at BBMAP_BBSPLT:
dxpy.utils.resolver.ResolutionError: Unable to resolve "*" to a data object or folder name in '/scratch/76/2fe85b07b7deac10ec9dacdb9afbad/bbsplit'
Steps to reproduce
Steps to reproduce the behaviour:
- Command line:
nf-core/rnaseq -profile docker,test -r tar - See error: Unable to resolve "*" to a data object or folder name
Expected behaviour
Log files
NEXTFLOW.log bbmap_bbsplit.log
Have you provided the following extra information/files:
- [x] The command used to run the pipeline
- [x] The
.nextflow.logfile
System
DNANexus AWS
Nextflow Installation
- Version: Version: 21.08.0-edge
- DNAnexus Nextflow app version 1.0.0-beta.5
Container engine
- Engine: Docker
- version:
Additional context
Thanks @drejom ! So the untarring issues are resolved with 0b4b140 but this will take quite a bit of re-factoring to pass this option to all modules that are untarring. At the moment, the pipeline assumes we aren't giving any optional parameters to tar which will need to be changed. Also, the --no-same-owner has to be passed after the archive path and not before so we will need to update the modules maybe to take options.args and options.args2. Some of this may be resolved by the native NF implementation we will be using very soon so I am tempted to wait for that. In the meantime, if you want to use a released, stable version of the rnaseq pipeline the only workaround is to pass it an untarred index because we need to understand where those permission issues are coming from properly too. Note, this only seems to happen with the tarred Salmon index when running -profile test.
With respect to the second error, using --skip_bbsplit bypasses this error and the pipeline completes successfully. It appears that there is an issue somewhere linking the bbsplit/ index folder as an input into the work directory and will need to be looked into. In any case, BBSplit isn't run by default in the pipeline anyway so this may not actually be an issue on real data.
For reference, this is what .command.sh looks like:
#!/bin/bash -ue
bbsplit.sh \
-Xmx6g \
path=bbsplit \
threads=2 \
in=RAP1_UNINDUCED_REP1_trimmed.fq.gz \
basename=RAP1_UNINDUCED_REP1_%.fastq.gz \
refstats=RAP1_UNINDUCED_REP1.stats.txt \
build=1 ambiguous2=all maxindel=150000
cat <<-END_VERSIONS > versions.yml
BBMAP_BBSPLIT:
bbmap: $(bbversion.sh 2>&1)
END_VERSIONS
and this is what the folder listing looks like:
-rw-r--r-- 1 patelh patelh 0 Oct 4 15:13 .command.begin
-rw-r--r-- 1 patelh patelh 3308 Oct 4 15:13 .command.err
-rw-r--r-- 1 patelh patelh 3308 Oct 4 15:13 .command.log
-rw-r--r-- 1 patelh patelh 0 Oct 4 15:13 .command.out
-rw-r--r-- 1 patelh patelh 9981 Oct 4 15:13 .command.run
-rw-r--r-- 1 patelh patelh 350 Oct 4 15:13 .command.sh
-rw-r--r-- 1 patelh patelh 228 Oct 4 15:13 .command.trace
-rw-r--r-- 1 patelh patelh 1 Oct 4 15:13 .exitcode
-rw-r--r-- 1 patelh patelh 191 Oct 4 15:13 RAP1_UNINDUCED_REP1.stats.txt
-rw-r--r-- 1 patelh patelh 28 Oct 4 15:13 RAP1_UNINDUCED_REP1_human.fastq.gz
-rw-r--r-- 1 patelh patelh 2811703 Oct 4 15:13 RAP1_UNINDUCED_REP1_primary.fastq.gz
-rw-r--r-- 1 patelh patelh 28 Oct 4 15:13 RAP1_UNINDUCED_REP1_sarscov2.fastq.gz
lrwxrwxrwx 1 patelh patelh 100 Oct 4 15:13 RAP1_UNINDUCED_REP1_trimmed.fq.gz -> /home/patelh/nf-core/rnaseq/work/2d/b6b07209deb17d230aa03f26766cdd/RAP1_UNINDUCED_REP1_trimmed.fq.gz
lrwxrwxrwx 1 patelh patelh 74 Oct 4 15:13 bbsplit -> /home/patelh/nf-core/rnaseq/work/dd/4b38e5bcf211e7b41670bb51d14ad3/bbsplit
-rw-r--r-- 1 patelh patelh 32 Oct 4 15:13 versions.yml
The issue could be this message, that could be related to a problem when staging the input data
dxpy.utils.resolver.ResolutionError: Unable to resolve "*" to a data object or folder name in '/scratch/76/2fe85b07b7deac10ec9dacdb9afbad/bbsplit'
Yup, how can we go about troubleshooting this @pditommaso ? It seems to be an issue on the DNAnexus side because the pipeline is happily chugging away on other platforms / set-ups incl. AWS.
I can try to run it, how it should be the NF launch command to replicate the problem?
The command below should reproduce the first error where we had weird permissions issue using untar (fixed in 0b4b140):
nextflow run nf-core/rnaseq -r profile test,docker
The second error where the file isn't staged properly should be reproducable by using the -r tar branch:
nextflow run nf-core/rnaseq -r profile test,docker -r tar
Indeed thanks, I'm trying but dnanexus suddenly kill it with apparent reason 😕
Hmmm...ok. Is there somewhere else we can push this issue or get the DNAnexus folks involved to help?
Hold-on now, with nextflow run nf-core/rnaseq -r profile test,docker -r tar
I'm getting this error :
Cannot a find a file system provider for scheme: s3
-- Check script '/opt/nextflow/assets/nf-core/rnaseq/./workflows/../subworkflows/local/input_check.nf' at line: 33 or see 'nextflow-211008-124922.log' file for more details
Why is trying to pull data from s3 with test? is that expected?
Yes, I moved it to s3 in the latest release because downloading over https was causing too many incomplete downloads. Latest version of the test samplesheet is here
Umm. this does not require the user to provide aws creds?
Nope, it is a public bucket. I am using it for Github Actions CI without any issues. Although the region may be a problem 🤔 because it is hosted in eu-west-1
running now ..
Think I've found the problem. patching.
Man, you are too quick! What was the problem?
Think an extra * in the dx download command
False flag, the * is not the problem. Looking better the error message is
Exception in thread "main" java.lang.AssertionError: No reference specified, and none exists. Please regenerate the index.
at align2.BBSplitter.mergeReferences(BBSplitter.java:347)
at align2.BBSplitter.processArgs(BBSplitter.java:176)
at align2.BBSplitter.main(BBSplitter.java:42)
Could it be the some expected input is missing?
@GHAStVHenry I have now closed the other issues running this pipeline on DNAnexus! Getting things together for a release at the moment so would be great if we can identify what the issue is here and try to fix it too. I suspect it will take a little digging like the previous issues 😅
Will do some more digging!
Been a while since this issue has been updated. Please feel free to re-open if the issue persists. Thanks!