Fine-control of cached data
Hi, first of all, thanks for this awesome lib!
Use Case
I'm developing a data-studio app with Dash and Dash-Extensions for multiple simultaneous users.
Problem Description
-
Say we have users A and B working with the app. Both will fetch data from a database, which I am server-side caching with
ServersideOutputcallbacks, which will create two large files in my filesystem. -
If user A fetches a new dataset, a new file will be created, but the old one will not be deleted (at least as long as
FileSystemStore.thresholdis large enough). -
This will result in three large files in the filesystem, but only two of them are being used.
Desired Behavior
I'd like to have no more than one cached file for a given (session, function) tuple, but it is currently cached with respect to a (session, function, arguments) tuple. Is there any way to implement the (session, function) tuple caching with Dash Extensions?
Thanks! That option is not available yet, but i guess it should be relatively easy to implement.
EDIT: I just tried implementing it, and it seems to work as intended. The syntax goes like this,
ServersideOutput("store", "data", arg_check=False)
where arg_check=False disables the argument checks. I have pushed an rc release, if you want to test it out,
https://pypi.org/project/dash-extensions/0.0.39rc1/
Hi @emilhe ! I just tested the new rc release and it worked very well!
Studying your code, I understood that, generally, the cache is defined by 4 entities:
- Session
- Output Target (this is needed due to the possibility of multi-output callback functions, right?)
- Callback Function
- Arguments passed to the Callback Function
I realized that I also need a func_check=False option, as the big dataset I referred above can be modified by multiple callbacks but I only need to keep one cache for the associated ServersideOutput data. Can you please add this option, too?
EDIT: also, for consistency, I believe EnrichedOutput.__init__ should have args_check=None (and func_check=None) as defaults, so it can use the backend setup primarily as is the case for session_check.
args_check seems like a great enhancement. Any plans to incorporate this into a future release?
I am considering adding it to the next release.
The feature has now been merged. It should be available in the latest release,
https://pypi.org/project/dash-extensions/0.0.56/
Hi @emilhe , I was wondering if this feature works with the default backend set as FileSystemStore. I see in this stackoverflow thread (https://stackoverflow.com/questions/68221603/faster-serializations-pickle-parquet-feather-than-json-in-plotly-dash-s) in the comments tha you says it would not work with the default one.
I have a deployed an application for more than a 100 users and there is only 1 ServerSideOutput callback. I used ServerSideOutput for my store. In my understanding, it uses the FileSystemStore as default. It is also using the default threshold of 500 and will clear the cache after 500 sessions. Am I correct?
If the args_check available, will it delete every file created at the end of a session (when the user quits the web page)? I would probably use this if this is it.
Also, what is the default value set to CACHE_DEFAULT_TIMEOUT?
Last question, what is the difference between ServerSideOutput and ServerSideOutputTransform?
Thank you @emilhe module helped me a lot :)
Hi @justgfather16 , as noted in the SO post, the default FileSystemStore implementation uses pickle, but you could easily switch to a different serializer if you need to.
In Dash, there is no (straight forward) way to determine when a client disconnects. The cache will be cleared when the threshold (500) is reached, independent of the number of sessions. The args_check keyword doesn't change this behavior.
The FileSystemStore ignores the CACHE_DEFAULT_TIMEOUT flag.
The ServerSideOutputTransform is the plugin that must be loaded to enable use of the ServerSideOutput component.
Hi @emilhe , thank you very much for your answers. I just have one last question. What will happen if i have users connected to the app and a new users login and reach the cache 500 threshold?
My callback using serversideoutput is only called at at the beginning of a session taking an id in the url to retrieve data to put in the store.
When the new user (user 501) logs in, it will overwrite data from the first user (user 1). Hence the next time a callback is invoked by user 1, the app will crash for user 1.
Okay, there will not be 500 users at the same time so there should be no problem with this i guess. There are probably 100 users that will use the dashboard per day with only one url. Then there will be no case where a user is using the dashboard and his cache file gets erased.
Thank you very much @emilhe , your help is very appreciated.