doit icon indicating copy to clipboard operation
doit copied to clipboard

Incorrect result when input file is modified during task execution

Open DarwinAwardWinner opened this issue 9 years ago • 5 comments

Consider the following dodo.py file:

#!/usr/bin/env doit -f

def cat(infile, outfile):
    with open(infile, "r") as in_, open(outfile, "w") as out:
        out.writelines(in_.readlines())

def task_generate_file2_from_file1():
    return {
        # Task reads the input file early, but takes a long time to
        # complete
        'actions': [
            (cat, ('file1.txt', 'file2.txt')),
            ["sleep", "2"],
        ],
        'file_dep': ['file1.txt'],
        'targets': ['file2.txt'],
        'clean': True,
    }

Note that the task doesn't "finish" until 2 seconds after the output file is written.

Now, source the following (must be sourced, not executed, so that job control is available):

#!/bin/sh

# Clean any previous runs
doit clean -a

# Generate initial contents of input file
echo "Original text" > file1.txt

# Modify the input file while doit is running
doit & sleep 1; echo "New text" > file1.txt; fg

# Show that the output file is out of date
echo -n "Contents of file1.txt: "; cat file1.txt
echo -n "Contents of file2.txt: "; cat file2.txt

# Try to run doit again; it believes that task is up-to-date
doit

and you will see the following:

$ source produce_bug.sh
generate_file2_from_file1 - removing file 'file2.txt'
[2] 3199
.  generate_file2_from_file1
[2]  - running    doit
Contents of file1.txt: New text
Contents of file2.txt: Original text
-- generate_file2_from_file1

As indicated in the comments of the shell script, my expectation is that the final call to doit should rebuild file2.txt, however, it fails to do so. I assume this is because doit does not record the contents of the input files until after the task is completed, which in this causes it to record the new contents of the input file, even though the task actually ran on the old contents.

Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.

Fund with Polar

DarwinAwardWinner avatar Jan 05 '16 21:01 DarwinAwardWinner

If you're wondering, the real context where I ran into this problem was in using doit to automate generating a PDF presentation from markdown source. I was in a quick cycle of making small modifications to the markdown file and then re-running doit to update the PDF after each change, and I noticed that when I went so fast that I saved a second change before doit finished running on the first, doit would sometimes refuse to rebuild the PDF file until I made another change.

DarwinAwardWinner avatar Jan 05 '16 21:01 DarwinAwardWinner

I assume this is because doit does not record the contents of the input files until after the task is completed, which in this causes it to record the new contents of the input file, even though the task actually ran on the old contents.

Your assumption is correct. It is done this way for 2 reasons:

  • doit doesnt bother compute ALL files that are outdated, when the first one is found "short-circuit" and mark the task as out-of-date, a change of behaviour would make doit less efficient in some way.
  • although not recommend it is not that rare people with tasks that the file dependency itself is modified during its execution, a change of behaviour would break someone else tasks.

Said that, I agree your situation is something that needs to be fixed. I can thing of a few options:

.1 doit somehow detect it is running that task on another instance and kill it .2 a global option to change the behaviour of checking the file hash before executing the task .3 same as above but per task option

I guess I would go with option 2. what do you think?

I am quite busy, so dont expect me to work on this. Contributions are welcome. cheers

schettino72 avatar Jan 08 '16 08:01 schettino72

Oh, and at least the docs should be more clear on this...

schettino72 avatar Jan 08 '16 08:01 schettino72

If I get some time, I might take a crack at implementing option 2.

DarwinAwardWinner avatar Jan 08 '16 21:01 DarwinAwardWinner

I just added a note on docs about this behaviour.

schettino72 avatar Jun 22 '16 12:06 schettino72