metaflow icon indicating copy to clipboard operation
metaflow copied to clipboard

Using an unordered collection for `foreach` can cause incorrect execution

Open romain-intel opened this issue 3 years ago • 2 comments

When Python does not guarantee the order of a collection (such as a set), using such an object as a target for a foreach fan-out can cause incorrect execution with multiple tasks executing on the same input and some inputs being missed.

Two possible workarounds:

  • use list() to convert to a list which guarantees an order
  • https://stackoverflow.com/questions/3848091/set-iteration-order-varies-from-run-to-run/32529871#32529871 to force sets to be in the same order.

romain-intel avatar Apr 06 '21 18:04 romain-intel

Just hit this issue as well and it can be quite confusing to debug what's going wrong. It would be great to document this!

One example of where documentation might be first seen is in the tutorial. Having run into this issue, it's now clear why self.genres is being converted from a set to a list in the 02 - statistics tutorial, even though there isn't an explicit comment remarking on that.

stevenhoelscher avatar Mar 02 '22 02:03 stevenhoelscher