doit icon indicating copy to clipboard operation
doit copied to clipboard

Use `list` for `file_dep` (fixes #254).

Open tillahoffmann opened this issue 2 years ago • 5 comments

This PR fixes #254 by changing file_dep to a list rather than using a set, ensuring the order of dependencies is preserved.

tillahoffmann avatar Jun 30 '22 17:06 tillahoffmann

Thanks.

If my memory is correct I made it a set() instead of list for performance reasons. If a task has hundreds of file_dep it would be inefficient to check in using a list. Not sure this still applies... have you checked it?

Please also add an entry on CHANGES, and doc changes (not sure if relevant or where).

schettino72 avatar Jul 01 '22 11:07 schettino72

I've had a look running the tests on master and the feature branch. There doesn't seem to be much in it, but I appreciate that the number of deps is likely low in the tests. See below for results.

Test duration results

Executing all tests

$ (master) repeat 5 time pytest > /dev/null
pytest -q > /dev/null  2.88s user 0.84s system 52% cpu 7.054 total
pytest -q > /dev/null  2.80s user 0.82s system 52% cpu 6.949 total
pytest -q > /dev/null  2.79s user 0.82s system 52% cpu 6.905 total
pytest -q > /dev/null  2.79s user 0.80s system 52% cpu 6.872 total
pytest -q > /dev/null  2.85s user 0.83s system 51% cpu 7.089 total
$ (file_dep-list) repeat 5 time pytest > /dev/null
pytest -q > /dev/null  2.80s user 0.82s system 52% cpu 6.925 total
pytest -q > /dev/null  2.77s user 0.80s system 52% cpu 6.855 total
pytest -q > /dev/null  2.76s user 0.80s system 51% cpu 6.853 total
pytest -q > /dev/null  2.79s user 0.81s system 52% cpu 6.898 total
pytest -q > /dev/null  2.77s user 0.80s system 52% cpu 6.857 total

Executing only tests in test_dependency.py

$ (master) repeat 5 time pytest tests/test_dependency.py > /dev/null 
pytest tests/test_dependency.py > /dev/null  0.35s user 0.17s system 14% cpu 3.553 total
pytest tests/test_dependency.py > /dev/null  0.34s user 0.17s system 14% cpu 3.541 total
pytest tests/test_dependency.py > /dev/null  0.34s user 0.17s system 14% cpu 3.546 total
pytest tests/test_dependency.py > /dev/null  0.35s user 0.17s system 14% cpu 3.556 total
pytest tests/test_dependency.py > /dev/null  0.35s user 0.17s system 14% cpu 3.557 total
$ (file_dep-list) repeat 5 time pytest tests/test_dependency.py > /dev/null 
pytest tests/test_dependency.py > /dev/null  0.35s user 0.17s system 14% cpu 3.569 total
pytest tests/test_dependency.py > /dev/null  0.34s user 0.16s system 14% cpu 3.536 total
pytest tests/test_dependency.py > /dev/null  0.32s user 0.15s system 13% cpu 3.531 total
pytest tests/test_dependency.py > /dev/null  0.34s user 0.17s system 14% cpu 3.540 total
pytest tests/test_dependency.py > /dev/null  0.34s user 0.16s system 14% cpu 3.533 total

tillahoffmann avatar Jul 01 '22 15:07 tillahoffmann

I've had another look at performance, and I think this change would not negatively affect performance. I use setup.py to generate 5,000 files and then consider the time it takes to execute doit for a task that depends on all 5,000 files.

# setup.py
import os


os.makedirs("inputs", exist_ok=True)
for i in range(5000):
    with open(f"inputs/{i}.txt", "w") as fp:
        fp.write(str(i) * i)
# dodo.py
import os


os.makedirs("inputs", exist_ok=True)
for i in range(5000):
    with open(f"inputs/{i}.txt", "w") as fp:
        fp.write(str(i) * i)

Here's the evaluation.

# git:file_dep-list
$ time (repeat 100 doit)
...
8.15s user 3.18s system 94% cpu 11.962 total

# git:master
$ time (repeat 100 doit)
...
8.19s user 3.19s system 94% cpu 12.044 total

tillahoffmann avatar Aug 17 '22 16:08 tillahoffmann

Hi @schettino72, just wanted to follow up whether this is something you're interested in merging.

tillahoffmann avatar Oct 06 '22 13:10 tillahoffmann

@tillahoffmann sorry for delay. I am planning to dedicate some time for open-source in the last week of this month.

schettino72 avatar Oct 06 '22 14:10 schettino72