doit
doit copied to clipboard
Python-action: detect which dependencies changed and which targets are missing
Essentially need to get the same information in a python-action as provided by doit info
on the console.
I need this to process only those dependencies of the task which have changed.
What I've tried:
def my_action(task):
print(task.dep_changed)
However this will print all the dependencies, no matter what. I guess it is related to this comment in the code:
# dep_changed is set on get_status()
# Some commands (like `clean` also uses expand_args but do not
# uses get_status, so `changed` is not available.
And need the missing targets as well, those seem to be separate from dep_changed in doit info
.
However this will print all the dependencies, no matter what.
Sounds like a bug. Can you show me a minimal example to reproduce the problem?
And need the missing targets as well
That is not supported, it will report all targets. you can easily check witch targets are missing with python code. Not sure it makes sense for doit
to report those... if you really want that we can discuss, please create a separate issue.
Sounds like a bug. Can you show me a minimal example to reproduce the problem?
Turns out to be a bug indeed. dep_changed
is correct if one of the dependencies changed, but it is incorrect if one of the targets have been removed:
import os
import shutil
def task_copy():
return {
'file_dep': ['a.txt', 'b.txt'],
'targets': ['out/a.txt', 'out/b.txt'],
'actions': [copy_action],
}
def copy_action(task, dependencies):
print(task.dep_changed)
if not os.path.exists('out'):
os.mkdir('out')
for dep in dependencies:
shutil.copyfile(dep, 'out/{}'.format(dep))
- Create
a.txt
andb.txt
- Run
doit
- Delete
out/a.txt
- Run
doit -v 2
- Prints
['a.txt', 'b.txt']
(incorrect) - Expected
['a.txt']
because in factb.txt
is still up to date
You could also add an example of task.dep_changed
to the documentation, because I had to figure it out from the source.
you can easily check witch targets are missing with python code
Ok I accept this solution.
Actually the right way to use this feature is with a parameter named changed
def my_action(changed):
print(changed)
The current behaviour is expected. When a task is not up-to-date for any reason other than dependencies changed, it will report as if all file_dep
were modified.
This was a questionable decision... do you have a specific use-case, I might reconsider this behaviour...
Also this behaviour is described on source code but not on user's docs. So definitely some action is required. https://github.com/pydoit/doit/blob/master/doit/dependency.py#L571
Actually the right way to use this feature is with a parameter named
changed
Thank you, from the docs I had no idea you can inject changed
do you have a specific use-case, I might reconsider this behaviour...
My use case:
- In the example above, suppose that
copy_action
takes a long time for each file - Because of 1) I want to copy only the files that are a) not up to date or b) their target doesn't exist
- Iterating on
changed
sounds like a great solution, but sometimes this behavior will make the task do more work than it should
Proposed solution: if changed
reported empty list in this case, I can iterate all the targets and check if any of them are missing, then add their dependency to my list of changed dependencies.
from the docs I had no idea you can inject
changed
It is there: http://pydoit.org/tasks.html#keywords-with-task-metadata
My use case:
For your use case it is better to have a separate task for each file being copied, since they are independent...
There is a cost on creating the list of changed dependencies, and this is seldom used in practice.
This is an old feature, today i would implement this feature in a different way... anyway for now I am more inclined to set changed to None
, meaning it was not calculated.
For your use case it is better to have a separate task for each file being copied, since they are independent...
The number of files and their names are not known in advance in the use case. As far as I know doit needs a method for each task. In your suggestion one would need to generate methods (one task for one file), how would you do that?
I am more inclined to set changed to
None
, meaning it was not calculated
I understand there is no way to know which target depends on which dependency, because the description is general.
I suggest you could introduce dependency/target description like "folder/dep*.txt" and "folder/target*.txt" meaning a 1:1 target:dependency relationship. GNU make has a similar feature, eg.:
out/%.png: input/%.png
With these restrictions it would be possible to calculate the changed
variable, which is essential for supporting incremental build properly.
What do you think?
@schettino72 if you agree with the previous comment, I would create a new ticket for the feature mentioned in it and we can consider this discussion resolved.
Sorry, I do not agree with Make style regular expressions to define dependencies and targets. If you do not know the name of files in advance you could use delayed-tasks.
Here is an example:
http://pydoit.org/task_creation.html#delayed-task-creation
I do not agree with Make style regular expressions to define dependencies and targets
Why? It would be a lot easier to use than the delayed task creation you linked.
@patrikturi concerning your last comment about generic expressions, I think that have the same use case and I solved it easily by creating one subtask per file dynamically.
You will see how I did it in issue #327. I also propose in that issue a utility method to ease the process even more. Would something like that satisfy both the user (@patrikturi ) and the maintainer (@schettino72) ? Let me know what you think!
Also concerning the other topic in this ticket: I had the issue today to try to debug "why did a task run again ??? It should not".
Being able to print to stdout the reason why tasks are run (dependency change or file dep change or target change) using a simple commandline flag (for example a maven-style flag doit -X
or a pytest-style doit -v
) would really be a killer feature - if it is not already there (in which case it should probably be highlighted in the doc in a better way). Should we open a ticket for this in particular ?
EDIT: for what's worth, my current workaround for this is to use the following why_am_i_running
python action explicitly at the beginning of all my task actions lists:
from os.path import exists
def why_am_i_running(task, changed):
for t in task.targets:
if not exists(t):
print("Running task '%s' because one of its targets does not exist anymore: '%s'" % (task, t))
return
print("Running task '%s' because the following changed: '%s'" % (task, changed))
EDIT2: There might be a case missing in this draft: the case where result_dep
is the reason for re-running.
@smarie Thank you! This helper is exactly what I needed.
You're welcome @patrikturi ! Thanks for the feedback