streamlit icon indicating copy to clipboard operation
streamlit copied to clipboard

Defered data for download button

Open Vinno97 opened this issue 1 year ago • 25 comments

Problem

The download button currently expects its data to be available when declaring the button. If data needs to be read from disk (or worse: compiled multiple disk sources), this can make the app needlessly slow. In my app, the data downloading is not a common use case, but the packing of the data for downloading is relatively expensive. Caching helps, but only when the data doesn't change.

Solution

I propose a method to only load and preprocess (archive, pickle, etc) when the download is actually requested.

I propose to also allow a function as a data type that gets called as soon as the download button is pressed. This callback then returns the actual data.

def get_data():
    data = some_heavy_data_loading()
    return data

st.download_button("Download Data", get_data, file_name="my-data.dat")

Possible additions:

Currently a download button accepts str, bytes, TextIO, BinaryIO, or io.RawIOBase. With deferred loading, it would also be possible to accept a file pointer and stream the data to the user. This might bring huge speed and memory benefits when downloading large files.

Technically this streaming would also be possible without deferred loading, but then you're keeping unnecessary files open.


Community voting on feature requests enables the Streamlit team to understand which features are most important to our users.

If you'd like the Streamlit team to prioritize this feature request, please use the 👍 (thumbs up emoji) reaction in response to the initial post.

Vinno97 avatar Jul 28 '22 12:07 Vinno97

@Vinno97 Thanks for the suggestion. This would be indeed a nice addition to the download button, especially when dealing with large files. I will forward this feature request to our product team.

LukasMasuch avatar Jul 28 '22 14:07 LukasMasuch

In the meantime, I'm using this as a way of ensuring that page flow is not interrupted by large file prep

def customDownloadButton(df):
    if st.button('Prepare downloads'):
        #prep data for downloading
        csv = convert_df(df)
        json_lines = convert_json(df)
        parquet = convert_parquet(df)
        tab1, tab2, tab3 = st.tabs(["Convert to CSV", "Convert to JSON", "Convert to Parquet"])
        with tab1:
            st.download_button('Download', csv, file_name='data.csv')
        with tab2:
            st.download_button('Download', json_lines, file_name='data.json')
        with tab3:
            st.download_button('Download', parquet, file_name='data.parquet')

tomgallagher avatar Jul 29 '22 05:07 tomgallagher

Yes agree! Back when we implemented download button, I know that we also thought about allowing users to pass a function. Not sure if we cut that just to reduce scope or if there were any reasons against doing that. Will revisit!

jrieke avatar Jul 30 '22 00:07 jrieke

I also had this issue, but it appears that it does approximately what you proposed, @Vinno97 ? The docs mention that you could have a callback for this.

Not sure if I'm missing some nuance with blocking when downloading large files, but I've already used this for data to be generated on click, regardless if it's data files or octet streams to be saved as files (eg: zip).

Lifted from the docs:

@st.cache
 def convert_df(df):
     # IMPORTANT: Cache the conversion to prevent computation on every rerun
     return df.to_csv().encode('utf-8')

csv = convert_df(my_large_df)

st.download_button(
     label="Download data as CSV",
     data=csv,
     file_name='large_df.csv',
     mime='text/csv',
 )

@jrieke Was this functionality added in the meantime and not linked to this issue ?

xR86 avatar Aug 29 '22 14:08 xR86

Nope we didn't implement this yet. We don't have a timeline yet but I'm 99 % sure we want to do this at some point.

jrieke avatar Sep 23 '22 22:09 jrieke

Any progress on this ? Do we have an ETA when this bug is gonna be fixed?

amirhessam88 avatar Dec 26 '22 02:12 amirhessam88

I would appreciate if this gets resolved. I already tried to address this issue on the forum a couple months ago: https://discuss.streamlit.io/t/create-download-file-upon-clicking-a-button/32613 My idea was to solve this using some JS, but it's messy and causes some slight shifting down of the page content.

In my opinion, st.download_button should only fill memory with the file's content upon acutally clicking the button instead of on every script re-run.

wolfgang-koch avatar Jan 09 '23 11:01 wolfgang-koch

I'd also like to voice appreciation this feature. I finally tracked down my app's occasional hanging to this issue. In the meantime, gating the download button behind a "prepare data for download" button like @tomgallagher's example above is a clumsy but okay workaround.

jzluo avatar Jan 19 '23 03:01 jzluo

This would be a great feature. I know its highly requested, but when working with APIs, the lack of this feature makes it a miserable experience. It has to hit the API each time the page is reloaded to prep the download, meaning lots of requests within a quota are used up. Its even worse if you have multiple tabs on a page, each of which download a different dataset for the user - It means x api calls per page load, per tab, each time the script is rerun.

Ive mitigated it by using a nested button like tom suggested, to 'get' data, then show the download button to download it, but a proper way to combine both into one UX Action would be amazing.

HStep20 avatar Jan 28 '23 16:01 HStep20

+1

masonearles avatar Mar 14 '23 03:03 masonearles

Same problem here. In my case I need to generate a excel file from multiple large pandas dataframes (one dataframe per sheet). I write the data as BytesIO. The experience is that going from a pandas dataframes to a BytesIO buffer takes about 0.003s, but on the streamlit app, the user is left hanging for multiple seconds. Something between 5s and 10s.

ElenaBossolini avatar Aug 12 '23 19:08 ElenaBossolini

def get_data():
    data = some_heavy_data_loading()
    return data

st.download_button("Download Data", get_data, file_name="my-data.dat")

def get_data(): st.write("test") data = some_heavy_data_loading() return data

I added 'st.write("test")' in get_data, and found that "test"was printed before download_button. it means the get_data() still runs even download button is un-clicked.

SabraHealthCare avatar Oct 28 '23 12:10 SabraHealthCare

def get_data():
    data = some_heavy_data_loading()
    return data

st.download_button("Download Data", get_data, file_name="my-data.dat")

def get_data(): st.write("test") data = some_heavy_data_loading() return data

I added 'st.write("test")' in get_data, and found that "test"was printed before download_button. it means the get_data() still runs even download button is un-clicked.

Unless there has been an update that hasn't been announced here, I'm not sure that a function can be called from st.download_button in this way.

andrewpimm avatar Oct 30 '23 10:10 andrewpimm

+1 to this feature, it'd be great for developers to create custom calculators that provide business value and a rich UX.

jsulopzs avatar Dec 13 '23 09:12 jsulopzs

any updates on this feature?

CharlesFr avatar Dec 17 '23 20:12 CharlesFr

+1

ViniciusgCaetano avatar Jan 06 '24 15:01 ViniciusgCaetano

any updates on this feature?

zbjdonald avatar Jan 11 '24 08:01 zbjdonald

I came across this issue as well. Besides large data payloads being created on every run, it is annoying that there is no way to create the data only after the download button is clicked. In my case the raw data to be downloaded is created and stored as session state "after" the position of the download button in the code. Now when I click the download button the previously created data state is downloaded but not the current state.

Here is an example:

create_data = st.button("Create data")

if "data" not in st.session_state:
    st.session_state.data = None

st.download_button(
    label="Export",
    data=st.session_state.data,
    file_name=file_name,
)

if create_data:
    # logic to create data here
    st.session_state.data = create_data_logic()

Now, first I click on the "creata data" button and afterwards I click the "download" button but only None is downloaded. Only on an app rerun the session state available to the download button is updated and the correct data is downloaded.

If the data creation process could happen in a callback after the download button is clicked, there would be no issue...

Currently this workaround does the job for me, but I feel this should be natively possible in strewamlit without js hacks...

LarsHill avatar Jan 13 '24 14:01 LarsHill

Personally I want to say that Streamlit is very unpleasant for new users and I need to google every step and I continuously facing with issues with use cases. And yes, I want to +1 this bug too because when I want to download the data I want to click on the button, wait processing and get the data.

anki-code avatar Feb 08 '24 16:02 anki-code

Hi Team,

We currently have the same issue and makes st.download_button unusable in production. Is there a workaround till the callback function is added? Also is there an ETA for the data callback being added?

sfc-gh-pkommini avatar Feb 27 '24 01:02 sfc-gh-pkommini

@sfc-gh-pkommini The only workaround I ever found is using two buttons as posted above.

goyodiaz avatar Feb 27 '24 17:02 goyodiaz

+1 on this issue.

A super basic use-case is offering users a download of PNG images. This is a typical desire of a user if you want "archival quality" and are willing to eat the storage size - forcing people into JPEG all the time is not nice. PNG being mostly uncompressed means the filesize / data payload is going to be higher. Even moderately large PNG of dims 3072 x 4096 ends up being ~26 MB, which is totally feasible for generating in-memory and offering for one-off downloads. The ask is just to defer the costly serialization operations until the user actually clicks the download button, rather than having to do it every time just to display a download button. The workaround is too fiddly and requires too much ad-hoc state management to really be called a solution IMO.

BenGravell avatar Mar 10 '24 12:03 BenGravell

My team encountered this bug when apps are deployed in replicas to something like Kubernetes.

iandesj avatar Apr 03 '24 17:04 iandesj