prefect
prefect copied to clipboard
Log a warning/raise error if user tries to cache something undeterministic (uncacheable)
First check
- [X] I added a descriptive title to this issue.
- [X] I used the GitHub search to find a similar request and didn't find it.
- [X] I searched the Prefect documentation for this feature.
Prefect Version
2.x
Describe the current behavior
I initially thought this was a prefect bug, but after a lot of investigation I was able to determine that cloudpickle
used by task_input_hash
is non deterministic from run to run in lots of different scenarios. See the following issues opened on their end:
- https://github.com/cloudpipe/cloudpickle/issues/453
- https://github.com/cloudpipe/cloudpickle/issues/510
- https://github.com/cloudpipe/cloudpickle/issues/120
For me, the issue came from trying to pass in a class to a prefect task with caching enabled using task_input_hash
. Strangely this is only deterministic if the class is first imported from another module and not defined in the same file as the script that kicks off the flow (not defined in __main__
).
This single hard to debug issue has led to an incredible amount of frustration at prefect and confusion among myself and team members, and from surveying the landscape of caching in python there doesn't seem to be an incredible out of the box easier solution
- dill seems to still be working on determinism - https://github.com/uqfoundation/dill/pull/501
- huggingface did something mainly to fingerprint larger datasets that wouldn't fit in memory to be pickled (?) - hf fingerprint.py
- streamlit has two different caching fn's and a bunch of errors for user error mitigation - streamlit docs
Describe the proposed behavior
I'm not expert enough to know of what the best solution is or if any of the other libraries I mentioned are definitively better, but from a user's perspective I want to emphasize one simpler thing that streamlit does well that has saved me a ton of pain in the past: the UnhashableParamError
In streamlit's caching if you try to cache a custom object it will just error and you will receive a UnhashableParamError
. Had prefect done this for me it would've alleviated so so much confusion about why caching wasn't working (leading me down rabbit holes of getting more confused with the distinction between prefect caching and results).
Perhaps an error is aggressive, but even a warning log would make a huge difference to debugging why from run to run prefect's caching isn't working. And maybe task_input_hash
should be more restrictive in forcing user's to have hashable args instead of blindly using a nondeterministic cloudpickle as the failsafe to json
Example Use
No response
Additional context
No response