docs
docs copied to clipboard
Guidance on how to cache Polars LazyFrame
Link to doc page in question (if any):
https://docs.streamlit.io/develop/concepts/architecture/caching
Name of the Streamlit feature whose docs need improvement:
@st.cache_data / @st.cache_resource
What you think the docs should say:
Polars' LazyFrame fits somewhere between data and a resource, because it represents a query that will result in a DataFrame when collected. I think it would be good if the docs included this type in the large table on the bottom to advise whether a function returning a pl.LazyFrame should be decorated with @st.cache_data, @st.cache_resource, or neither (I don't know the answer).
Hi @BartSchuurmans. I'll need to do a little testing to confirm, but the initial thoughts I heard back from engineering were this:
Since a LazyFrame is data that hasn't been computed yet, it'd likely be better to cache the collected result with
cache_datainstead. If there is any good reason to cache a LazyFrame, then it will probably needcache_resourcesincecache_datamight not work.
I'll try to test some things to confirm so I can add an example or something. :)
I just played around with LazyFrames a little bit, and as far as I can tell, it works just fine with both st.cache_data and st.cache_resource. Although I'm not a Polars expert, I'm not observing the the LazyFrame mutates at all, even if I try to mutate its underlying data. I'd be inclined to say, use st.cache_resource for efficiency, but I've read that Polars uses a kind of "lazy copy" which is "shallow until otherwise needed," so even that might not make all the much of a difference.