sunbeam icon indicating copy to clipboard operation
sunbeam copied to clipboard

"Bus error" -- issue with snakemake or conda?

Open levlitichev opened this issue 1 year ago • 1 comments

Hi Charlie,

Sorry that you're hearing from me so soon. Hit another snag. I think this is related to which conda environment snakemake is trying to use?

Not a reprex, so apologies for all the text:

(sunbeam4.0.0) [litichev@node157 sunbeam]$ sunbeam run all_qc --profile lsf --configfile sunbeam_config.yml
Running: snakemake --snakefile /home/litichev/sunbeam_v4/workflow/Snakefile --conda-prefix /home/litichev/sunbeam_v4/.snakemake all_qc --profile lsf --configfile sunbeam_config.yml
Using profile lsf for setting default command line arguments.
Collecting host/contaminant genomes... done.
Building DAG of jobs...
Using shell: /bin/bash
Provided cluster nodes: 500
Job stats:
job                         count
------------------------  -------
adapter_removal_unpaired        3
all_qc                          1
fastqc                          3
fastqc_report                   1
find_low_complexity             3
qc_final                        3
remove_low_complexity           3
sample_intake                   3
trimmomatic_unpaired            3
total                          23

Select jobs to execute...

[Thu Aug 24 22:44:01 2023]
rule sample_intake:
    input: /project/thaisslab/2023-07_tim_metatranscriptomics/fastq/pgp3_S12_R1_001.fastq.gz
    output: /project/thaisslab/2023-07_tim_metatranscriptomics/sunbeam/sunbeam_output/qc/00_samples/pgp3_1.fastq.gz
    log: /project/thaisslab/2023-07_tim_metatranscriptomics/sunbeam/sunbeam_output/logs/sample_intake_pgp3_1.log
    jobid: 15
    reason: Missing output files: /project/thaisslab/2023-07_tim_metatranscriptomics/sunbeam/sunbeam_output/qc/00_samples/pgp3_1.fastq.gz
    wildcards: sample=pgp3, rp=1
    resources: mem_mb=1000, mem_mib=954, disk_mb=1000, disk_mib=954, tmpdir=<TBD>

Submitted job 15 with external jobid '78781830 logs/cluster/sample_intake/sample=pgp3.rp=1/jobid15_f848f478-04d9-458c-9a20-b46573f7b903.out'.

[Thu Aug 24 22:44:01 2023]
rule sample_intake:
    input: /project/thaisslab/2023-07_tim_metatranscriptomics/fastq/pga2_S8_R1_001.fastq.gz
    output: /project/thaisslab/2023-07_tim_metatranscriptomics/sunbeam/sunbeam_output/qc/00_samples/pga2_1.fastq.gz
    log: /project/thaisslab/2023-07_tim_metatranscriptomics/sunbeam/sunbeam_output/logs/sample_intake_pga2_1.log
    jobid: 9
    reason: Missing output files: /project/thaisslab/2023-07_tim_metatranscriptomics/sunbeam/sunbeam_output/qc/00_samples/pga2_1.fastq.gz
    wildcards: sample=pga2, rp=1
    resources: mem_mb=1853, mem_mib=1768, disk_mb=1853, disk_mib=1768, tmpdir=<TBD>

Submitted job 9 with external jobid '78781831 logs/cluster/sample_intake/sample=pga2.rp=1/jobid9_cfae02f6-1da9-46a4-95d8-d8f4f7b0093c.out'.

[Thu Aug 24 22:44:01 2023]
rule sample_intake:
    input: /project/thaisslab/2023-07_tim_metatranscriptomics/fastq/714R_S14_R1_001.fastq.gz
    output: /project/thaisslab/2023-07_tim_metatranscriptomics/sunbeam/sunbeam_output/qc/00_samples/714R_1.fastq.gz
    log: /project/thaisslab/2023-07_tim_metatranscriptomics/sunbeam/sunbeam_output/logs/sample_intake_714R_1.log
    jobid: 3
    reason: Code has changed since last execution
    wildcards: sample=714R, rp=1
    resources: mem_mb=3149, mem_mib=3004, disk_mb=3149, disk_mib=3004, tmpdir=<TBD>

Submitted job 3 with external jobid '78781832 logs/cluster/sample_intake/sample=714R.rp=1/jobid3_e990c596-adfc-4ee1-8f36-9381fb040a1d.out'.
[Thu Aug 24 22:44:21 2023]
Error in rule sample_intake:
    jobid: 15
    input: /project/thaisslab/2023-07_tim_metatranscriptomics/fastq/pgp3_S12_R1_001.fastq.gz
    output: /project/thaisslab/2023-07_tim_metatranscriptomics/sunbeam/sunbeam_output/qc/00_samples/pgp3_1.fastq.gz
    log: /project/thaisslab/2023-07_tim_metatranscriptomics/sunbeam/sunbeam_output/logs/sample_intake_pgp3_1.log (check log file(s) for error details)
    cluster_jobid: 78781830 logs/cluster/sample_intake/sample=pgp3.rp=1/jobid15_f848f478-04d9-458c-9a20-b46573f7b903.out

Error executing rule sample_intake on cluster (jobid: 15, external: 78781830 logs/cluster/sample_intake/sample=pgp3.rp=1/jobid15_f848f478-04d9-458c-9a20-b46573f7b903.out, jobscript: /project/thaisslab/2023-07_tim_metatranscriptomics/sunbeam/.snakemake/tmp.g5m52_gf/snakejob.sample_intake.15.sh). For error details see the cluster log and the log files of the involved rule(s).
[Thu Aug 24 22:44:32 2023]
Finished job 3.
1 of 23 steps (4%) done
[Thu Aug 24 22:44:42 2023]
Error in rule sample_intake:
    jobid: 9
    input: /project/thaisslab/2023-07_tim_metatranscriptomics/fastq/pga2_S8_R1_001.fastq.gz
    output: /project/thaisslab/2023-07_tim_metatranscriptomics/sunbeam/sunbeam_output/qc/00_samples/pga2_1.fastq.gz
    log: /project/thaisslab/2023-07_tim_metatranscriptomics/sunbeam/sunbeam_output/logs/sample_intake_pga2_1.log (check log file(s) for error details)
    cluster_jobid: 78781831 logs/cluster/sample_intake/sample=pga2.rp=1/jobid9_cfae02f6-1da9-46a4-95d8-d8f4f7b0093c.out

Error executing rule sample_intake on cluster (jobid: 9, external: 78781831 logs/cluster/sample_intake/sample=pga2.rp=1/jobid9_cfae02f6-1da9-46a4-95d8-d8f4f7b0093c.out, jobscript: /project/thaisslab/2023-07_tim_metatranscriptomics/sunbeam/.snakemake/tmp.g5m52_gf/snakejob.sample_intake.9.sh). For error details see the cluster log and the log files of the involved rule(s).
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Sunbeam failed with error.
Warnings: (0) []
Errors: (7) [56, 60, 63, 68, 72, 75, 77]
No benchmark files found
Complete log: .snakemake/log/2023-08-24T224335.666318.snakemake.log
(sunbeam4.0.0) [litichev@node157 sunbeam]$ cat logs/cluster/sample_intake/sample=pga2.rp=1/jobid9_cfae02f6-1da9-46a4-95d8-d8f4f7b0093c.err
/project/thaisslab/2023-07_tim_metatranscriptomics/sunbeam/.snakemake/tmp.g5m52_gf/snakejob.sample_intake.9.sh: line 3: 246369 Bus error               (core dumped) /home/litichev/mambaforge/envs/sunbeam4.0.0/bin/python3.11 -m snakemake --snakefile '/home/litichev/sunbeam_v4/workflow/Snakefile' --target-jobs 'sample_intake:sample=pga2,rp=1' --allowed-rules 'sample_intake' --cores 'all' --attempt 1 --force-use-threads --resources 'mem_mb=1853' 'mem_mib=1768' 'disk_mb=1853' 'disk_mib=1768' --wait-for-files '/project/thaisslab/2023-07_tim_metatranscriptomics/sunbeam/.snakemake/tmp.g5m52_gf' '/project/thaisslab/2023-07_tim_metatranscriptomics/fastq/pga2_S8_R1_001.fastq.gz' --force --keep-target-files --keep-remote --max-inventory-time 0 --nocolor --notemp --no-hooks --nolock --ignore-incomplete --rerun-triggers 'software-env' 'params' 'mtime' 'code' 'input' --skip-script-cleanup --use-conda --conda-frontend 'mamba' --conda-prefix '/home/litichev/sunbeam_v4/.snakemake' --conda-base-path '/home/litichev/mambaforge' --wrapper-prefix 'https://github.com/snakemake/snakemake-wrappers/raw/' --configfiles '/project/thaisslab/2023-07_tim_metatranscriptomics/sunbeam/sunbeam_config.yml' --printshellcmds --latency-wait 30 --scheduler 'ilp' --scheduler-solver-path '/home/litichev/mambaforge/envs/sunbeam4.0.0/bin' --default-resources 'mem_mb=max(2*input.size_mb, 1000)' 'disk_mb=max(2*input.size_mb, 1000)' 'tmpdir=system_tmpdir' --mode 2

I also tried remaking my lsf profile. Here's how it currently looks:

(sunbeam4.0.0) [litichev@node157 sunbeam]$ cat ~/.config/snakemake/lsf/CookieCutter.py
class CookieCutter:
    """
    Cookie Cutter wrapper
    """

    @staticmethod
    def get_default_mem_mb() -> int:
        return int("2048")

    @staticmethod
    def get_log_dir() -> str:
        return "logs/cluster"

    @staticmethod
    def get_default_queue() -> str:
        return ""

    @staticmethod
    def get_default_project() -> str:
        return ""

    @staticmethod
    def get_lsf_unit_for_limits() -> str:
        return "MB"

    @staticmethod
    def get_unknwn_behaviour() -> str:
        return "wait"

    @staticmethod
    def get_zombi_behaviour() -> str:
        return "ignore"

    @staticmethod
    def get_latency_wait() -> float:
        return float("30")

    @staticmethod
    def get_wait_between_tries() -> float:
        return float("0.001")

    @staticmethod
    def get_max_status_checks() -> int:
        return int("1")

    @staticmethod
    def jobscript_timeout() -> int:
        return int("10")
(sunbeam4.0.0) [litichev@node157 sunbeam]$ cat ~/.config/snakemake/lsf/config.yaml
latency-wait: "30"
jobscript: "lsf_jobscript.sh"
use-conda: "True"
use-singularity: "False"
printshellcmds: "True"
restart-times: "0"
jobs: "500"
cluster: "lsf_submit.py"
cluster-status: "lsf_status.py"
cluster-cancel: "lsf_cancel.py"
max-jobs-per-second: "10"
max-status-checks-per-second: "10"(sunbeam4.0.0)

Thanks again for your help. Please let me know if I can provide more information.

-Lev

levlitichev avatar Aug 25 '23 02:08 levlitichev

Hi Lev,

I'm not sure what's going on here... but my best guess is that it's something with access to log files. In sunbeam 4 we set a LOG_FP variable in the main snakefile that's then used by each rule as the base path for where to put logs. I wonder if that's somehow conflicting in a weird way with get_log_dir() from your CookieCutter file.

Ulthran avatar Aug 29 '23 13:08 Ulthran

Hi Lev, there have been a lot of updates to snakemake in regards to cluster execution (https://github.com/snakemake/snakemake/releases/tag/v8.0.0) which sunbeam >=4.3.7 should incorporate. You may have to do some work to reconfigure your setup still, but most of the work of interacting with the executor should now be handled by this plugin (https://github.com/BEFH/snakemake-executor-plugin-lsf). Let me know if you want any help setting this up (although I haven't worked with this plugin in particular).

Ulthran avatar Mar 29 '24 14:03 Ulthran

Hi Charlie, thanks for following up. I tried using this LSF executor on our cluster. I hit an issue related to this line. I was able to work around it but anyway ended up reverting back to an older version of Snakemake and the old LSF profile.

levlitichev avatar Apr 08 '24 13:04 levlitichev

Alrighty, unfortunately I think, as you found, there would be a bit of work to accommodate all the recent changes if you have already have a working setup built around sunbeam. Let me know if you ever want help with updating.

Ulthran avatar Apr 08 '24 14:04 Ulthran