docs icon indicating copy to clipboard operation
docs copied to clipboard

Error (TypeError: cannot pickle 'function' object) thrown in `st.cache_resource` is confusing because we only mention that an object needs to be hashable but it actually must also be pickleable

Open MacroBull opened this issue 1 month ago • 8 comments

Checklist

  • [x] I have searched the existing issues for similar issues.
  • [x] I added a very descriptive title to this issue.
  • [x] I have provided sufficient information below to help reproduce this issue.

Summary

Instead of using Python’s hashing semantics, Streamlit attempts to serialize (pickle) function arguments to bytes and hashes those bytes, which leads to surprising and misleading UnhashableParamErrors.

This behavior conflicts with the doc A function's arguments must be hashable to cache it.

This behavior is conceptually incorrect and violates the expectation that “hashable objects are cacheable”, where st.cache_resource is explicitly process-local and does not provide cross-process persistence, and where Python’s own functools.lru_cache accepts any object that is hashable according to standard Python semantics, without imposing serialization requirements.

Reproducible Code Example

import streamlit as st

class Data:
	def __init__(self, payload: bytes):
		self.payload = payload

@st.cache_resource
def transform(data: Data) -> Data:
	return Data(payload=b'new')

data = Data(payload=b'old')
st.write(st.__version__)
st.write(data)
st.write(hash(data))
st.write(transform(data))  # <-- throw error

Steps To Reproduce

No response

Expected Behavior

No response

Current Behavior

Output:

1.50.0

dataData__main__.Data(payload: bytes)
No docs available
payloadbytes | b'old' -- | --

8779468763100

Error message:

# _to_bytes <class '__main__.Data'>: <__main__.Data object at 0x7fc211defdc0>
# _to_bytes <class 'function'>: <function _reconstructor at 0x7fc2af172560>
────────────────────────── Traceback (most recent call last) ───────────────────────────
  ntime/caching/hashing.py:648 in _to_bytes                                             
                                                                                        
  _reduce_ex                                                                            
                                                                                        
     73 │   │   state = None                                                            
     74 │   else:                                                                       
     75 │   │   if base is cls:                                                         
  ❱  76 │   │   │   raise TypeError(f"cannot pickle {cls.__name__!r} object")           
     77 │   │   state = base(self)                                                      
     78 │   args = (cls, base, state)                                                   
     79 │   try:                                                                        
────────────────────────────────────────────────────────────────────────────────────────
TypeError: cannot pickle 'function' object

The above exception was the direct cause of the following exception:

────────────────────────── Traceback (most recent call last) ───────────────────────────
  ntime/caching/hashing.py:650 in _to_bytes                                             
────────────────────────────────────────────────────────────────────────────────────────
UnhashableTypeError

During handling of the above exception, another exception occurred:

────────────────────────── Traceback (most recent call last) ───────────────────────────
  ntime/caching/hashing.py:650 in _to_bytes                                             
────────────────────────────────────────────────────────────────────────────────────────
UnhashableParamError: Cannot hash argument 'data' (of type `__main__.Data`) in 
'transform'.

Is this a regression?

  • [ ] Yes, this used to work in a previous version.

Debug info

  • Streamlit version: 1.50.0
  • Python version: Python 3.10.19
  • Operating System:
  • Browser:

Additional Information

No response

MacroBull avatar Dec 29 '25 10:12 MacroBull

If this issue affects you, please react with a 👍 (thumbs up emoji) to the initial post.

Your feedback helps us prioritize which bugs to investigate and address first.

Views

github-actions[bot] avatar Dec 29 '25 10:12 github-actions[bot]

Thanks for submitting this issue @MacroBull . I will take a closer look at why this is implemented in this way and whether it is something that can be changed or if there are other restrictions that necessitate this implementation. If this can’t be changed, perhaps we can clarify the documentation.

This week the team is on holiday. Posting from my personal account, but I am a maintainer.

lawilby avatar Dec 29 '25 18:12 lawilby

Thanks for the clarification and for taking the time to review this. It sounds like the current behavior may be intentional due to underlying constraints. If that’s the case, I agree that clearly documenting the requirement around picklability (in addition to hashability) would help set correct expectations for users and avoid confusion.

Happy to help with a documentation update or follow whatever direction the team feels is most appropriate. Thanks again for the guidance.

S-PANDIYAN avatar Dec 30 '25 03:12 S-PANDIYAN

This issue reports that st.cache_resource requires objects to be pickle-able (serializable) rather than just Python-hashable, which conflicts with the documentation that states "A function's arguments must be hashable to cache it."

The root cause is in lib/streamlit/runtime/caching/hashing.py (lines 642-651). The _to_bytes method uses obj.__reduce__() (part of Python's pickle protocol) as the fallback for custom objects that don't have explicit handling. This means objects must be serializable to generate cache keys, even though st.cache_resource is process-local and doesn't need cross-process persistence. The user's Data class is hashable via Python's default hash(), but its __reduce__ output contains a _reconstructor function that cannot be pickled, causing the TypeError: cannot pickle 'function' object error.

Related Issues


This is an automated triage comment. Please verify the suggested labels and related issues.

github-actions[bot] avatar Dec 31 '25 14:12 github-actions[bot]

Hi @S-PANDIYAN ,

I have done some investigation on this issue. While you are right that st.cache_resource is process local, Streamlit's execution model means that for the feature to work correctly we need the cache to be based on something that will stay the same across reruns. It looks like the default python __hash__ method is based on the memory address, which will change when we rerun the script and that means we will always get a cache miss.

I think this issue that the AI action above found covers the thinking on changing this https://github.com/streamlit/streamlit/issues/10348 -- we need someone to take a look at different hashing function alternatives to __reduce__ and see if there is something that will work with the rerun cycle. If not, we could allow users to define a custom hashing function on the class. It looks like we already support this with the hash_funcs param, but using this can create repetitive code in every instance where the decorator is used.

I will leave this open but move it to the docs repo since there is a separate request to improve the docs so that the error makes more sense!

sfc-gh-lwilby avatar Jan 05 '26 15:01 sfc-gh-lwilby

Hi @S-PANDIYAN ,

I have done some investigation on this issue. While you are right that st.cache_resource is process local, Streamlit's execution model means that for the feature to work correctly we need the cache to be based on something that will stay the same across reruns. It looks like the default python __hash__ method is based on the memory address, which will change when we rerun the script and that means we will always get a cache miss.

I think this issue that the AI action above found covers the thinking on changing this streamlit/streamlit#10348 -- we need someone to take a look at different hashing function alternatives to __reduce__ and see if there is something that will work with the rerun cycle. If not, we could allow users to define a custom hashing function on the class. It looks like we already support this with the hash_funcs param, but using this can create repetitive code in every instance where the decorator is used.

I will leave this open but move it to the docs repo since there is a separate request to improve the docs so that the error makes more sense!

Thanks for the explanation — I now understand that Streamlit’s Cache API is fundamentally value-based rather than object-identity-based, and that stable hashing therefore relies on serialization semantics rather than Python object identity.

However, I still have a follow-up question about the design choice here.

In the example, Data is still plainly serializable via pickle.dumps, yet Streamlit explicitly excludes it from the whitelist and instead routes it through the __reduce__ path. I’m trying to understand the rationale behind this decision.

I’ve read through the relevant code and tests, but I’m still a bit confused about the intent:

The hashing logic that falls back to __reduce__ for non-whitelisted types: https://github.com/streamlit/streamlit/blob/65306a6135cc72f61150dd064173c2d262845dd8/lib/streamlit/runtime/caching/hashing.py#L645

The tests distinguishing test_reduce_not_hashable vs test_reduce_fallback: https://github.com/streamlit/streamlit/blob/65306a6135cc72f61150dd064173c2d262845dd8/lib/tests/streamlit/runtime/caching/hashing_test.py#L675

In particular, why is the former designed to be non-hashable by Streamlit, even though it is technically pickle-serializable, while the latter is allowed (except the custom __reduce__)?

Is the intent here to intentionally reject certain serializable types to avoid accidental semantic instability, or is there another constraint (e.g. performance, security, determinism guarantees) that motivated this distinction?

I’d appreciate some clarification on the reasoning behind this design.

MacroBull avatar Jan 06 '26 04:01 MacroBull

Thanks for the detailed explanation @ MacroBull My goal with the docs change was only to clarify the user-facing requirement and error message; the deeper hashing design decisions make sense to discuss separately at the framework level. Conversation opened. 1 read message.

Skip to content https://mail.google.com/mail/u/0/ Using Gmail with screen readers https://mail.google.com/mail/u/0/ https://accounts.google.com/SignOutOptions?hl=en&continue=https://mail.google.com/mail&service=mail&ec=GBRAFw 1 of 1,953 Re: [streamlit/docs] Error (TypeError: cannot pickle 'function' object) thrown in st.cache_resource is confusing because we only mention that an object needs to be hashable but it actually must also be pickleable (Issue #1399) Inbox MacroBull @.***> Unsubscribe 10:13 AM (2 hours ago) to streamlit/docs, me, Mention MacroBull left a comment (streamlit/docs#1399) https://github.com/streamlit/docs/issues/1399#issuecomment-3713088597

Hi @S-PANDIYAN https://github.com/S-PANDIYAN ,

I have done some investigation on this issue. While you are right that st.cache_resource is process local, Streamlit's execution model means that for the feature to work correctly we need the cache to be based on something that will stay the same across reruns. It looks like the default python hash method is based on the memory address, which will change when we rerun the script and that means we will always get a cache miss.

I think this issue that the AI action above found covers the thinking on changing this streamlit/streamlit#10348 https://github.com/streamlit/streamlit/issues/10348 -- we need someone to take a look at different hashing function alternatives to reduce and see if there is something that will work with the rerun cycle. If not, we could allow users to define a custom hashing function on the class. It looks like we already support this with the hash_funcs param, but using this can create repetitive code in every instance where the decorator is used.

I will leave this open but move it to the docs repo since there is a separate request to improve the docs so that the error makes more sense!

Thanks for the explanation — I now understand that Streamlit’s Cache API is fundamentally value-based rather than object-identity-based, and that stable hashing therefore relies on serialization semantics rather than Python object identity.

However, I still have a follow-up question about the design choice here.

In the example, Data is still plainly serializable via pickle.dumps, yet Streamlit explicitly excludes it from the whitelist and instead routes it through the reduce path. I’m trying to understand the rationale behind this decision.

I’ve read through the relevant code and tests, but I’m still a bit confused about the intent:

The hashing logic that falls back to reduce for non-whitelisted types: https://github.com/streamlit/streamlit/blob/65306a6135cc72f61150dd064173c2d262845dd8/lib/streamlit/runtime/caching/hashing.py#L645

The tests distinguishing test_reduce_not_hashable vs test_reduce_fallback: https://github.com/streamlit/streamlit/blob/65306a6135cc72f61150dd064173c2d262845dd8/lib/tests/streamlit/runtime/caching/hashing_test.py#L675

In particular, why is the former designed to be non-hashable by Streamlit, even though it is technically pickle-serializable, while the latter is allowed (except the custom reduce)?

Is the intent here to intentionally reject certain serializable types to avoid accidental semantic instability, or is there another constraint (e.g. performance, security, determinism guarantees) that motivated this distinction?

I’d appreciate some clarification on the reasoning behind this design.

— Reply to this email directly, view it on GitHub https://github.com/streamlit/docs/issues/1399#issuecomment-3713088597, or unsubscribe https://github.com/notifications/unsubscribe-auth/BDEX4HIPNRUYSRTV7QBQ2VT4FM4NXAVCNFSM6AAAAACQXQUIJSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTOMJTGA4DQNJZG4 . You are receiving this because you were mentioned.

On Tue, Jan 6, 2026 at 10:13 AM MacroBull @.***> wrote:

MacroBull left a comment (streamlit/docs#1399) https://github.com/streamlit/docs/issues/1399#issuecomment-3713088597

Hi @S-PANDIYAN https://github.com/S-PANDIYAN ,

I have done some investigation on this issue. While you are right that st.cache_resource is process local, Streamlit's execution model means that for the feature to work correctly we need the cache to be based on something that will stay the same across reruns. It looks like the default python hash method is based on the memory address, which will change when we rerun the script and that means we will always get a cache miss.

I think this issue that the AI action above found covers the thinking on changing this streamlit/streamlit#10348 https://github.com/streamlit/streamlit/issues/10348 -- we need someone to take a look at different hashing function alternatives to reduce and see if there is something that will work with the rerun cycle. If not, we could allow users to define a custom hashing function on the class. It looks like we already support this with the hash_funcs param, but using this can create repetitive code in every instance where the decorator is used.

I will leave this open but move it to the docs repo since there is a separate request to improve the docs so that the error makes more sense!

Thanks for the explanation — I now understand that Streamlit’s Cache API is fundamentally value-based rather than object-identity-based, and that stable hashing therefore relies on serialization semantics rather than Python object identity.

However, I still have a follow-up question about the design choice here.

In the example, Data is still plainly serializable via pickle.dumps, yet Streamlit explicitly excludes it from the whitelist and instead routes it through the reduce path. I’m trying to understand the rationale behind this decision.

I’ve read through the relevant code and tests, but I’m still a bit confused about the intent:

The hashing logic that falls back to reduce for non-whitelisted types:

https://github.com/streamlit/streamlit/blob/65306a6135cc72f61150dd064173c2d262845dd8/lib/streamlit/runtime/caching/hashing.py#L645

The tests distinguishing test_reduce_not_hashable vs test_reduce_fallback:

https://github.com/streamlit/streamlit/blob/65306a6135cc72f61150dd064173c2d262845dd8/lib/tests/streamlit/runtime/caching/hashing_test.py#L675

In particular, why is the former designed to be non-hashable by Streamlit, even though it is technically pickle-serializable, while the latter is allowed (except the custom reduce)?

Is the intent here to intentionally reject certain serializable types to avoid accidental semantic instability, or is there another constraint (e.g. performance, security, determinism guarantees) that motivated this distinction?

I’d appreciate some clarification on the reasoning behind this design.

— Reply to this email directly, view it on GitHub https://github.com/streamlit/docs/issues/1399#issuecomment-3713088597, or unsubscribe https://github.com/notifications/unsubscribe-auth/BDEX4HIPNRUYSRTV7QBQ2VT4FM4NXAVCNFSM6AAAAACQXQUIJSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTOMJTGA4DQNJZG4 . You are receiving this because you were mentioned.Message ID: @.***>

S-PANDIYAN avatar Jan 06 '26 07:01 S-PANDIYAN

@MacroBull it doesn't look like we are explicitly excluding Data to me, it looks like we just have a very conservative approach. I looked at how we might extend it to include Data and created this issue that just deals with extending support from only built-in functions to also include some standard libraries. https://github.com/streamlit/streamlit/issues/13514 . I will create a PR for this and see what the rest of the team thinks. What do you think? Thanks for raising this issue, it looks like there are just many ways this feature could be extended/improved.

sfc-gh-lwilby avatar Jan 06 '26 10:01 sfc-gh-lwilby