aMeta icon indicating copy to clipboard operation
aMeta copied to clipboard

Bug in Authentication_Score rule

Open LeandroRitter opened this issue 9 months ago • 1 comments

This seems to be a long-standing bug which popped up here and there (quite annoying) but did not cause serious problems, that is why we have not noticed it, but simply preferred to restart aMeta a few times. I believe, what happens is that the output of PMDtools (PMDscores.txt file) is computed independently of execution of score.R, please ave a look at these two rules:

rule PMD_scores:
    input:
        bam="results/AUTHENTICATION/{sample}/{taxid}/sorted.bam",
    output:
        scores="results/AUTHENTICATION/{sample}/{taxid}/PMDscores.txt",
    message:
        "PMD_scores: COMPUTING PMD SCORES"
    log:
        "logs/PMD_SCORES/{sample}_{taxid}.log",
    threads: 1
    conda:
        "../envs/malt.yaml"
    envmodules:
        *config["envmodules"]["malt"],
    shell:
        "(samtools view -h {input.bam} || true) | pmdtools --printDS > {output.scores}"


rule Authentication_Score:
    input:
        rma6="results/MALT/{sample}.trimmed.rma6",
        maltextractlog="results/AUTHENTICATION/{sample}/{taxid}/MaltExtract_output/log.txt",
        name_list="results/AUTHENTICATION/{sample}/{taxid}/name_list.txt",
    output:
        scores="results/AUTHENTICATION/{sample}/{taxid}/authentication_scores.txt",
    message:
        "Authentication_Score: COMPUTING AUTHENTICATION SCORES"
    params:
        exe=WORKFLOW_DIR / "scripts/score.R",
    log:
        "logs/AUTHENTICATION_SCORE/{sample}_{taxid}.log",
    threads: 1
    conda:
        "../envs/malt.yaml"
    envmodules:
        *config["envmodules"]["malt"],
    shell:
        "Rscript {params.exe} {input.rma6} $(dirname {input.maltextractlog}) {input.name_list} $(dirname {input.name_list}) &> {log};"

However, score.R uses PMDscores.txt, so it is essential that PMDscores rules is executed prior to Authentication_Score rule. What happens now is that Authentication_Score may start before the PMDscores.txt has been generated by the PMDscores rule. Therefore the PMDscores.txt file is missing at the moment of running score.R script, and the unfortunate

Error in if ((dim(df)[1] != 0) & (sum(df$V4 > 3)/dim(df)[1] > 0.1)) { :
  missing value where TRUE/FALSE needed
Execution halted

error occurs. I will try to fix this asap in the PR which I am working on now

LeandroRitter avatar May 21 '24 07:05 LeandroRitter