docs icon indicating copy to clipboard operation
docs copied to clipboard

Guidance on how to cache Polars LazyFrame

Open BartSchuurmans opened this issue 1 year ago • 1 comments

Link to doc page in question (if any):

https://docs.streamlit.io/develop/concepts/architecture/caching

Name of the Streamlit feature whose docs need improvement:

@st.cache_data / @st.cache_resource

What you think the docs should say:

Polars' LazyFrame fits somewhere between data and a resource, because it represents a query that will result in a DataFrame when collected. I think it would be good if the docs included this type in the large table on the bottom to advise whether a function returning a pl.LazyFrame should be decorated with @st.cache_data, @st.cache_resource, or neither (I don't know the answer).

BartSchuurmans avatar Aug 23 '24 19:08 BartSchuurmans

Hi @BartSchuurmans. I'll need to do a little testing to confirm, but the initial thoughts I heard back from engineering were this:

Since a LazyFrame is data that hasn't been computed yet, it'd likely be better to cache the collected result with cache_data instead. If there is any good reason to cache a LazyFrame, then it will probably need cache_resource since cache_data might not work.

I'll try to test some things to confirm so I can add an example or something. :)

sfc-gh-dmatthews avatar Aug 28 '24 15:08 sfc-gh-dmatthews

I just played around with LazyFrames a little bit, and as far as I can tell, it works just fine with both st.cache_data and st.cache_resource. Although I'm not a Polars expert, I'm not observing the the LazyFrame mutates at all, even if I try to mutate its underlying data. I'd be inclined to say, use st.cache_resource for efficiency, but I've read that Polars uses a kind of "lazy copy" which is "shallow until otherwise needed," so even that might not make all the much of a difference.

sfc-gh-dmatthews avatar Dec 26 '24 08:12 sfc-gh-dmatthews