redun How to detect failure to create a file?

There seems to be no way to trigger a re-run after failing to create a file.

As I develop my workflow, I find I may have a bug which causes an expected file not to be created at all. I am surprised this is not an error, since I have a File as output which doesn't exist in the filesystem.

Upon fixing my bug, subsequent runs of redun have no effect, as it seems to have happily cached the file as missing, and there is no changed state in the filesystem to trigger rerunning the task.

I think it should be an error if an output File from a task doesn't exist when the task completes?

Example workflow:

from redun import task, File
from typing import List

redun_namespace = "redun.examples.missing_after_error"


@task()
def create(path: str, content: str) -> File:
    if ok():
        with open(path, "w") as f:
            f.write(content)
    return File(path)


def ok() -> bool:
    # flipping this to True and rerunning doesn't trigger redun to generate the missing files
    return False


@task()
def main() -> List[File]:
    f1 = create("out/freddy1", "Hello Freddy 1\n")
    f2 = create("out/freddy2", "Hello Freddy 2\n")
    return [f1, f2]

I understand that the ok function is not hashed as part of the task. That's not my concern. This is about missing files.

Jan 13 '25 23:01 tesujimath

Thanks for submitting this issue. I agree with your explanation of how redun reacted to your bug fix and how it treats non-existent Files. A few thoughts.

Surprisingly, this case hasn't happened very often for us, but I can see how you encountered it.

Upon fixing my bug, subsequent runs of redun have no effect, as it seems to have happily cached the file as missing, and there is no changed state in the filesystem to trigger rerunning the task.

Just checking, I assume the bug fix was outside of a task? Normally such a code change would also trigger a rerun of that part of the workflow. In case its helpful, if we wanted to be reactive to code changes also in the ok() function of your example, one could use the task option hash_includes:

@task(hash_includes=[ok])
def create(path: str, content: str) -> File:
    if ok():
        with open(path, "w") as f:
            f.write(content)
    return File(path)

For more info:

https://insitro.github.io/redun/tasks.html#task-hashing
https://insitro.github.io/redun/tasks.html#hash-includes

Thus far we have treated non-existent Files as just another file state. So when we hash a File in a task result, it's just another hash.

https://github.com/insitro/redun/blob/147c19dedf762ae1ebd76805e6d4328726a4ee12/redun/file.py#L464-L476

There is a concept in redun called "value validity", where when trying to use an cached value we check if its still valid by calling the is_valid() method. Thus far, this method for File is checking whether the hash has changed since last workflow run (non-existence state gets a hash too):

https://github.com/insitro/redun/blob/147c19dedf762ae1ebd76805e6d4328726a4ee12/redun/file.py#L1337-L1342

In theory this could be changed to also require the File exists. You could also imagine changing File.get_hash() to throw an exception if the File doesn't exist. That would cause a workflow halt even earlier during the first run, which is likely helpful.

Before actually implementing the changes listed above, I would need to think pretty hard about unintended consequences or breaking changes for use cases of the current behavior. At first glance, it does seem reasonable. Either way, if you were interested in having this new behavior for yourself right away, you could subclass File to implement it. Here is an example of subclassing done in the redun lib to provide a version of File where the hash depends on the content of the file:

https://github.com/insitro/redun/blob/147c19dedf762ae1ebd76805e6d4328726a4ee12/redun/file.py#L1752-L1765

In our own code, we have implemented several kinds of File subclasses to customize different hashing and validity behavior.

I hope this helps.

Jan 14 '25 14:01 mattrasmus

Thank you for your detailed response!

This is not a blocker for me, so I will not attempt a quick work-around. My problem was caused by a function silently failing to create its output. I fixed it to fail properly and noisily!

I understand what you are saying about using hash_includes to trigger the task when dependent code changes, but my dependent code is rather a lot of code over several Python modules, so I don't think I'm going to be able to do that. But I'm OK with that, at least for now.

However, I do think that it is worth requiring a file to exist if it is specified as task output. I would really appreciate you considering incorporating that as new behaviour. I think it's the right thing in general.

Thanks!

Jan 15 '25 02:01 tesujimath