pyrodigal icon indicating copy to clipboard operation
pyrodigal copied to clipboard

Non-deterministic behaviour

Open aaronmussig opened this issue 2 years ago • 1 comments

Hello,

Firstly, thanks for your work with Pyrodigal! I wasn't able to determine if Pyrodigal should be deterministic, if it is then ignore this ticket.

I have an extremely rare case that took quite some time to identify, but Pyrodigal will occasionally give a different result when running via Shell vs. Python subprocess. The strange part is that the likelihood of Pyrodigal giving a different result is higher when running via a Python subprocess though.

To replicate the issue:

Dockerfile

FROM python:3.10-slim

RUN apt-get update && apt-get install -y \
    curl \
    unzip \
    && rm -rf /var/lib/apt/lists/*

RUN python -m pip install pyrodigal==2.0.4

RUN mkdir -p /data /results /tmp/download

WORKDIR /tmp/download

RUN curl -OJX GET "https://api.ncbi.nlm.nih.gov/datasets/v2alpha/genome/accession/GCA_009700405.1/download?include_annotation_type=GENOME_FASTA&filename=GCA_009700405.1.zip" -H "Accept: application/zip" && \
    unzip GCA_009700405.1.zip && \
    rm GCA_009700405.1.zip && \
    mv ncbi_dataset/data/GCA_009700405.1/GCA_009700405.1_ASM970040v1_genomic.fna /data/genome.fna && \
    rm -rf /tmp/download

WORKDIR /data

Entering container

docker build -t pyrodigal_test . && docker run -it pyrodigal_test /bin/bash

Running Pyrodigal

#!/bin/bash

for i in {1..100}
do
   python -c "import os; os.system('pyrodigal -m -i /data/genome.fna -g 11 -o /dev/null -a /results/python_$i.faa -d /dev/null -p single')";
   pyrodigal -m -i /data/genome.fna -g 11 -o /dev/null -a /results/shell_$i.faa -d /dev/null -p single;
done

Results

Over 200 trials I get the following results:

Hash Command Line (count) Python os.system (count)
a9f114 192 178
597610 8 22

The differences between the two hashes are:

7082,7090c7082,7090
< >WLMD01000046.1_10 # 9095 # 10405 # -1 # ID=101_10;partial=00;start_type=ATG;rbs_motif=AGGAG;rbs_spacer=5-10bp;gc_cont=0.487
< MSADDQLRKQQEFVLRTIEERNIRFVRLWFTDVLGFLKSVAIAPAELANAFDEGIGFDGS
< AIEGFARITESDMLAKPDPSTFSVLPWRTEAPGAARMFCDIVMPDGSASHADPRHVLRRI
< LNKAATMGYTCYTHPEIEFFLFKDRPEIGKRPTPVDQGGYFDHTPAVVGHDFRRTAITML
< EAMGISVEFSHHEGAPGQQEIDLRYADALTTADNIMTFRHVVKEVALDQGFHASFIPKPF
< TDHPGSGMHTHVSLFQGEKNAFYDAKAEYNLSKVGRSFIAGLLRHAPEITAVTNQWVNSY
< KRLHGGGEAPALVNWGHNNRGALVRVPMYKPNNENSTRVEFRSPDSACNPYLAYAVMIAA
< GLKGVEEGYELADSSDATVLPSNLNEAIIAMEKSALVRETLGEHVFEYVLRNKRAEWNDY
< SRQVTAYELDRYLPIL*
---
> >WLMD01000046.1_10 # 9095 # 10414 # -1 # ID=101_10;partial=01;start_type=Edge;rbs_motif=None;rbs_spacer=None;gc_cont=0.487
> YVPMSADDQLRKQQEFVLRTIEERNIRFVRLWFTDVLGFLKSVAIAPAELANAFDEGIGF
> DGSAIEGFARITESDMLAKPDPSTFSVLPWRTEAPGAARMFCDIVMPDGSASHADPRHVL
> RRILNKAATMGYTCYTHPEIEFFLFKDRPEIGKRPTPVDQGGYFDHTPAVVGHDFRRTAI
> TMLEAMGISVEFSHHEGAPGQQEIDLRYADALTTADNIMTFRHVVKEVALDQGFHASFIP
> KPFTDHPGSGMHTHVSLFQGEKNAFYDAKAEYNLSKVGRSFIAGLLRHAPEITAVTNQWV
> NSYKRLHGGGEAPALVNWGHNNRGALVRVPMYKPNNENSTRVEFRSPDSACNPYLAYAVM
> IAAGLKGVEEGYELADSSDATVLPSNLNEAIIAMEKSALVRETLGEHVFEYVLRNKRAEW
> NDYSRQVTAYELDRYLPIL*

i.e. the 597610 hash starts with Y instead of M.

aaronmussig avatar Feb 03 '23 03:02 aaronmussig

Hi @aaronmussig,

Normally Pyrodigal should indeed be deterministic. Non-determinism is often caused when some parts of the program are reading undefined memory, often out of bounds.

This may be linked to https://github.com/hyattpd/Prodigal/pull/100 as I'm seeing the error is happening next to the edge on the reverse strand. I'll check if the fixed version still causes the issue.

althonos avatar Feb 03 '23 08:02 althonos

Hi @althonos,

Re-ran this using v3.5.2 and both execution methods are returning the same result. Thanks for looking into it!

aaronmussig avatar Sep 20 '24 01:09 aaronmussig

Good to hear, thanks!

althonos avatar Sep 20 '24 10:09 althonos