metaflow
metaflow copied to clipboard
Using an unordered collection for `foreach` can cause incorrect execution
When Python does not guarantee the order of a collection (such as a set), using such an object as a target for a foreach
fan-out can cause incorrect execution with multiple tasks executing on the same input and some inputs being missed.
Two possible workarounds:
- use
list()
to convert to a list which guarantees an order - https://stackoverflow.com/questions/3848091/set-iteration-order-varies-from-run-to-run/32529871#32529871 to force sets to be in the same order.
Just hit this issue as well and it can be quite confusing to debug what's going wrong. It would be great to document this!
One example of where documentation might be first seen is in the tutorial. Having run into this issue, it's now clear why self.genres
is being converted from a set to a list in the 02 - statistics tutorial, even though there isn't an explicit comment remarking on that.