wgbs-pipeline icon indicating copy to clipboard operation
wgbs-pipeline copied to clipboard

File latency issue?

Open molecules opened this issue 9 months ago • 0 comments

Describe the bug The pipeline creates a file that it doesn't think exists. In the example below, one case of this would be /redacted/wgbs_test/wgbs/91e49ae4-9226-4824-af45-301fc1a815e8/call-map/shard-0/inputs/667055062/indexes.tar.gz, which does exist when I check after the fact.

This is running on a compute cluster. I've repeated this multiple times, so it's not just a one-off error.

My first guess it that this is due to file system latency. If this were Snakemake, I'd know that I just needed to increase the value passed to --latency-wait, but I'm not sure how to set something like that for this pipeline.

OS/Platform

  • OS/Platform: Ubuntu 20.04.6 LTS
  • Pipeline version: 1.1.7
  • Caper version: 2.3.2

Caper configuration file

backend=slurm

# SLURM partition. DEFINE ONLY IF REQUIRED BY YOUR CLUSTER'S POLICY.
# You must define it for Stanford Sherlock.
slurm-partition=serial

# SLURM account. DEFINE ONLY IF REQUIRED BY YOUR CLUSTER'S POLICY.
# You must define it for Stanford SCG.
slurm-account=

# Local directory for localized files and Cromwell's intermediate files.
# If not defined then Caper will make .caper_tmp/ on CWD or `local-out-dir`.
# /tmp is not recommended since Caper store localized data files here.
local-loc-dir=

cromwell=/redacted/caper/2.3.2/jars/cromwell-82.jar
womtool=/redacted/caper/2.3.2/jars/womtool-82.jar

Input JSON file

{
  "wgbs.benchmark_mode": true,
  "wgbs.extra_reference": "/redacted2/encode-wgbs/wgbs-pipeline/tests/data/conversion_control.fa.gz",
  "wgbs.fastqs": [
    [
      [
        "/redacted2/encode-wgbs/wgbs-pipeline/tests/data/sample5_data_1_200000.fastq.gz",
        "/redacted2/encode-wgbs/wgbs-pipeline/tests/data/sample5_data_2_200000.fastq.gz"
      ]
    ]
  ],
  "wgbs.reference": "/redacted2/encode-wgbs/wgbs-pipeline/tests/data/sacCer3.fa.gz",
  "wgbs.sample_names": [
    "sample5"
  ],
  "wgbs.underconversion_sequence_name": "NC_001416.1"
}

Error log

2023-10-04 21:10:58,310|caper.cli|INFO| Cromwell stdout: /redacted/wgbs_test/cromwell.out.1
2023-10-04 21:10:58,315|caper.caper_base|INFO| Creating a timestamped temporary directory. /redacted/wgbs_test/.caper_tmp/wgbs-pipeline/20231004_211058_313834
2023-10-04 21:10:58,315|caper.caper_runner|INFO| Localizing files on work_dir. /redacted/wgbs_test/.caper_tmp/wgbs-pipeline/20231004_211058_313834
2023-10-04 21:10:58,686|caper.caper_workflow_opts|INFO| Singularity image found in WDL metadata. wdl=/redacted2/encode-wgbs/1.1.8/wgbs-pipeline.wdl, s=docker://encodedcc/wgbs-pipeline:1.1.7
2023-10-04 21:10:58,706|caper.cromwell|INFO| Validating WDL/inputs/imports with Womtool...
2023-10-04 21:11:04,509|caper.nb_subproc_thread|INFO| Subprocess finished successfully.
2023-10-04 21:11:04,510|caper.cromwell|INFO| Passed Womtool validation.            
2023-10-04 21:11:04,510|caper.caper_runner|INFO| launching run: wdl=/redacted2/encode-wgbs/1.1.8/wgbs-pipeline.wdl, inputs=/redacted/wgbs_test/test_wgbs.json, backend_conf=/redacted/wgbs_test/.caper_tmp/wgbs-pipeline/20231004_211058_313834/backend.co
2023-10-04 21:11:15,543|caper.cromwell_workflow_monitor|INFO| Workflow: id=91e49ae4-9226-4824-af45-301fc1a815e8, status=Submitted
2023-10-04 21:11:15,605|caper.cromwell_workflow_monitor|INFO| Workflow: id=91e49ae4-9226-4824-af45-301fc1a815e8, status=Running
2023-10-04 21:11:23,864|caper.cromwell_workflow_monitor|INFO| Task: id=91e49ae4-9226-4824-af45-301fc1a815e8, task=wgbs.make_conf:-1, retry=0, status=CallCached
2023-10-04 21:11:26,814|caper.cromwell_workflow_monitor|INFO| Task: id=91e49ae4-9226-4824-af45-301fc1a815e8, task=wgbs.make_metadata_csv:-1, retry=0, status=CallCached
2023-10-04 21:11:29,834|caper.cromwell_workflow_monitor|INFO| Task: id=91e49ae4-9226-4824-af45-301fc1a815e8, task=wgbs.index_reference:-1, retry=0, status=CallCached
2023-10-04 21:11:35,809|caper.cromwell_workflow_monitor|INFO| Task: id=91e49ae4-9226-4824-af45-301fc1a815e8, task=wgbs.prepare:-1, retry=0, status=CallCached
2023-10-04 21:11:44,809|caper.cromwell_workflow_monitor|INFO| Task: id=91e49ae4-9226-4824-af45-301fc1a815e8, task=wgbs.map:0, retry=0, status=Started, job_id=2081286
2023-10-04 21:11:44,837|caper.cromwell_workflow_monitor|INFO| Task: id=91e49ae4-9226-4824-af45-301fc1a815e8, task=wgbs.map:0, retry=0, status=Running
2023-10-04 21:11:51,943|caper.cromwell_workflow_monitor|INFO| Task: id=91e49ae4-9226-4824-af45-301fc1a815e8, task=wgbs.map:0, retry=0, status=Done
2023-10-04 21:11:59,788|caper.cromwell_workflow_monitor|INFO| Task: id=91e49ae4-9226-4824-af45-301fc1a815e8, task=wgbs.map:0, retry=1, status=Started, job_id=2081287
2023-10-04 21:11:59,796|caper.cromwell_workflow_monitor|INFO| Task: id=91e49ae4-9226-4824-af45-301fc1a815e8, task=wgbs.map:0, retry=1, status=Running
2023-10-04 21:12:04,068|caper.cromwell_workflow_monitor|INFO| Task: id=91e49ae4-9226-4824-af45-301fc1a815e8, task=wgbs.map:0, retry=1, status=Done
2023-10-04 21:12:05,042|caper.cromwell_workflow_monitor|INFO| Workflow: id=91e49ae4-9226-4824-af45-301fc1a815e8, status=Failed
2023-10-04 21:12:15,586|caper.cromwell_metadata|INFO| Wrote metadata file. /redacted/wgbs_test/wgbs/91e49ae4-9226-4824-af45-301fc1a815e8/metadata.json
2023-10-04 21:12:15,587|caper.cromwell|INFO| Workflow failed. Auto-troubleshooting...
2023-10-04 21:12:15,589|caper.nb_subproc_thread|ERROR| Cromwell failed. returncode=1
2023-10-04 21:12:15,589|caper.cli|ERROR| Check stdout in /redacted/wgbs_test/cromwell.out.1
* Started troubleshooting workflow: id=91e49ae4-9226-4824-af45-301fc1a815e8, status=Failed
* Found failures JSON object.                                                      
[                                                                                  
    {                                                                              
        "causedBy": [                                                              
            {                                                                      
                "message": "Job wgbs.map:0:2 exited with return code 1 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details.",
                "causedBy": []                                                     
            }                                                                      
        ],                                                                         
        "message": "Workflow failed"                                               
    }                                                                              
]                                                                                  
* Recursively finding failures in calls (tasks)...                                 
                                                                                   
==== NAME=wgbs.map, STATUS=RetryableFailure, PARENT=                               
SHARD_IDX=0, RC=1, JOB_ID=2081286                                                  
START=2023-10-04T21:11:41.051Z, END=2023-10-04T21:11:54.791Z                       
STDOUT=/redacted/wgbs_test/wgbs/91e49ae4-9226-4824-af45-301fc1a815e8/call-map/shard-0/execution/stdout
STDERR=/redacted/wgbs_test/wgbs/91e49ae4-9226-4824-af45-301fc1a815e8/call-map/shard-0/execution/stderr
STDERR_CONTENTS=                                                                   
tar: /redacted/wgbs_test/wgbs/91e49ae4-9226-4824-af45-301fc1a815e8/call-map/shard-0/inputs/667055062/indexes.tar.gz: Cannot open: No such file or directory
tar: Error is not recoverable: exiting now                                         
ln: failed to access '/redacted/wgbs_test/wgbs/91e49ae4-9226-4824-af45-301fc1a815e8/call-map/shard-0/execution/mapping/**/*.bam': No such file or directory
ln: failed to access '/redacted/wgbs_test/wgbs/91e49ae4-9226-4824-af45-301fc1a815e8/call-map/shard-0/execution/mapping/**/*.csi': No such file or directory
ln: failed to access '/redacted/wgbs_test/wgbs/91e49ae4-9226-4824-af45-301fc1a815e8/call-map/shard-0/execution/mapping/**/*.bam.md5': No such file or directory
ln: failed to access '/redacted/wgbs_test/wgbs/91e49ae4-9226-4824-af45-301fc1a815e8/call-map/shard-0/execution/mapping/**/*.json': No such file or directory
                                                                                   
                                                                                   
==== NAME=wgbs.map, STATUS=Failed, PARENT=                                         
SHARD_IDX=0, RC=1, JOB_ID=2081287                                                  
START=2023-10-04T21:11:55.035Z, END=2023-10-04T21:12:04.072Z                       
STDOUT=/redacted/wgbs_test/wgbs/91e49ae4-9226-4824-af45-301fc1a815e8/call-map/shard-0/attempt-2/execution/stdout
STDERR=/redacted/wgbs_test/wgbs/91e49ae4-9226-4824-af45-301fc1a815e8/call-map/shard-0/attempt-2/execution/stderr
STDERR_CONTENTS=                                                                   
tar: /redacted/wgbs_test/wgbs/91e49ae4-9226-4824-af45-301fc1a815e8/call-map/shard-0/attempt-2/inputs/667055062/indexes.tar.gz: Cannot open: No such file or directory
tar: Error is not recoverable: exiting now                                         
ln: failed to access '/redacted/wgbs_test/wgbs/91e49ae4-9226-4824-af45-301fc1a815e8/call-map/shard-0/attempt-2/execution/mapping/**/*.bam': No such file or directory
ln: failed to access '/redacted/wgbs_test/wgbs/91e49ae4-9226-4824-af45-301fc1a815e8/call-map/shard-0/attempt-2/execution/mapping/**/*.csi': No such file or directory
ln: failed to access '/redacted/wgbs_test/wgbs/91e49ae4-9226-4824-af45-301fc1a815e8/call-map/shard-0/attempt-2/execution/mapping/**/*.bam.md5': No such file or directory
ln: failed to access '/redacted/wgbs_test/wgbs/91e49ae4-9226-4824-af45-301fc1a815e8/call-map/shard-0/attempt-2/execution/mapping/**/*.json': No such file or directory
                                                                                   

molecules avatar Oct 04 '23 21:10 molecules