streamlit Provide methods to refresh caching

Problem

Parts of my dashboard require sourcing CSV files from external hosts, and it can take a long time (60s) to download and processes that data. However, the data is made available at 6am every day. I already have a function which uses streamlit cache around my data fetching, but it would be amazing if I could say "at 6:05am every day, call this function".

Solution

MVP: Streamlit should have a decorator you can apply to functions so they run at startup. These functions could make use of existing python scheduling libraries like apscheduler to call (and thus refresh) caching whenever needed.

Possible additions: Streamlit could extend its decorators, integrating the scheduling side through decorator parameters and abstracting that away from the user.

Additional context

FR raised from this forum post: https://discuss.streamlit.io/t/eagerly-execute-specific-functions-on-app-launch/34957

It seems a few people have been trying to figure out how to refresh caching, and it would be good to not have to use selenium to simulate a web request from an external service to do so.

Community voting on feature requests enables the Streamlit team to understand which features are most important to our users.

If you'd like the Streamlit team to prioritize this feature request, please use the 👍 (thumbs up emoji) reaction in response to the initial post.

Dec 19 '22 05:12 Samreay

Thank you for sharing this request, @Samreay! Hopefully, interested community members will upvote this request so that we can prioritize it correctly.

Jan 26 '23 18:01 carolinefrasca

Can't wait for this scheduling feature to launch! Right now ttl seems to ignore the timedelta value and only runs when users open the app.

Mar 06 '23 10:03 noviechiuman

This would help us to populate the cache and not have the first user of the day wait a few minutes until everything is ready.

Apr 18 '23 16:04 ogabrielluiz

Any news on that? Someone knows how to hack Streamlit to do that? I know FastAPI has something like Lifespan methods:

You can define this startup and shutdown logic using the lifespan parameter of the FastAPI app, and a "context manager".

https://fastapi.tiangolo.com/advanced/events/#lifespan It would be nice if Streamlit could provide this kind of feature !

Oct 11 '23 13:10 andrewssobral

Thoughts on the API? The simplest approach would just be something like:

@st.cache_resource(eager=True)
def func(*args, **kwargs):
    return 1

But if we wanted to support one or multiple invocations with arguments it could be something like:

@st.cache_resource(eager=[
    (0, 1, 2, {"kwargs": "value"}),
    (1, 2, 3, {"kwargs": "value"}),
])
def func(*args, **kwargs):
    return 1

Oct 18 '23 23:10 zacps

@zacps yes, it would be nice like that. Or like FastAPI:

@st.cache_resource(eager=True)
def lifespan(*args, **kwargs):
    logger.info("lifespan method on startup")
    # ... do something on startup...
    yield
    # ... do something on shutdown...
    logger.info("lifespan method on shutdown")

I tried to test your piece of code, but i noticed that is not yet available on Streamlit 😅

TypeError: __call__() got an unexpected keyword argument 'eager'

Oct 19 '23 14:10 andrewssobral

Hello, I would also be very interested by this feature. Is there any plan to have something like that?

In the meantime, the only workaround I read is using selenium, and that's not really something acceptable on a production server.

So I wrote a quick and dirty script that uses the internal websocket to mimic a user interaction, using only streamlit dependencies. This is still an ugly workaround/hack, but better than selenium I think.

import argparse
import asyncio
import datetime

import tornado.websocket
from google.protobuf.json_format import MessageToDict
from streamlit.proto.BackMsg_pb2 import BackMsg
from streamlit.proto.ClientState_pb2 import ClientState
from streamlit.proto.ForwardMsg_pb2 import ForwardMsg
from streamlit.proto.WidgetStates_pb2 import WidgetStates


class WebSocketClient:
    """client heavily inspired by https://gist.github.com/sadernalwis/0a110d280b090e751a8de50d97862a35"""

    def __init__(self, event_loop: asyncio.AbstractEventLoop, url):
        self.url = url
        self.connection: tornado.websocket.WebSocketClientConnection | None = None
        self.event_loop = event_loop
        self.last_message_date = None

    async def start(self):
        await self.connect_and_read()
        if self.connection is not None:
            self.connection.write_message(
                BackMsg(rerun_script=ClientState(widget_states=WidgetStates())).SerializeToString(),
                binary=True,
            )

    def stop(self):
        self.event_loop.stop()

    async def connect_and_read(self):
        print("Connecting to websocket...")
        await tornado.websocket.websocket_connect(
            url=self.url,
            callback=self.maybe_retry_connection,
            on_message_callback=self.on_message,
            ping_interval=10,
            ping_timeout=30,
        )

    def maybe_retry_connection(self, future) -> None:
        try:
            self.connection: tornado.websocket.WebSocketClientConnection | None = future.result()
        # would be better to check the exception type here and retry only if it's retryable
        # also add some sort of limit on retry
        except:
            print("Could not reconnect, retrying in 3 seconds...")
            self.event_loop.call_later(3, self.connect_and_read)

    def on_message(self, message):
        # weirdly seems to happen when closing the websocket, didn't check why
        if not message:
            return

        # not useful in my example, but we could add another coroutine that check every second if the self.last_message is too old and create some sort of timeout,
        # or for very long script you could need to send an heartbeat using BackMsg(app_heartbeat=True)
        self.last_message_date = datetime.datetime.now()

        # retrieving and parsing the websocket message
        f_message = ForwardMsg()
        f_message.ParseFromString(message)
        message_dict = MessageToDict(f_message)

        # we detect that the script is finished
        if message_dict.get("scriptFinished"):
            print(message_dict.get("scriptFinished"))
            # we could also send a rerun BackMsg with different widget state or another page to run another streamlit script
            # else we just stop the websocket client
            self.stop()


def get_arg_parser() -> argparse.Namespace:
    parser = argparse.ArgumentParser()
    parser.add_argument(
        "--url",
        type=str,
        default="ws://localhost:8501/_stcore/stream",
        help="url of the target streamlit instance",
    )
    args = parser.parse_args()
    return args


def main():
    args = get_arg_parser()
    event_loop = asyncio.get_event_loop()
    client = WebSocketClient(event_loop, args.url)
    asyncio.ensure_future(client.start())
    event_loop.run_forever()


if __name__ == "__main__":
    main()

Feb 28 '24 11:02 thpiron

Was just struggling with this and would love to see a native solution to periodically refresh the cache, ideally by extending the current API to something like:

@st.cache_data(ttl=3600, refresh=600)
def my_slow_function():

I tried shoving my_slow_function() into a thread, but that resulted in the well-discussed "lack of script runner context" error and I don't want to hack at various unsupported solutions for that. Likewise implementing my own caching function would result in a new thread per user and thus not ideal.

Apr 30 '24 19:04 ee-github

Also looking forward to this feature ! If this is developed, perhaps following the cron syntax could be a good way to provide scheduling time/frequency information, as it is already a widely adopted convention for scheduling.

Jun 25 '24 09:06 Leobouloc

This would be one of the most important features!

Sep 15 '24 00:09 buenavista62