streamlit-aggrid icon indicating copy to clipboard operation
streamlit-aggrid copied to clipboard

Feat: increase data serialization speed

Open luke321321 opened this issue 2 years ago • 7 comments

Use pandas to_json() to serialize the dataframe instead of the custom serialization.

On a test dataframe of length 10_000 this sped up the conversion to json by 10x.

I've tested the outputs and they are consistent with the previous get_row_data() function for most use cases. There's a slight tweak with nested dataframes that the nested datafarme is already in json format and so doesn't have to be unpacked in JS. This makes it even easier for users as the previous behavior was confusing.

Test results

%timeit test_df.to_json(orient="records", date_format="iso")
20 ms ± 980 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit get_row_data(test_df)
210 ms ± 23.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

luke321321 avatar Apr 22 '22 07:04 luke321321

@thunderbug1 tagging you as this is similar to a pull request of yours that is still open. Have you or @luke321321 heard back from PablocFonseca on this performance improvement? Looks great tbh :)

cmayoracurzio avatar Apr 28 '22 16:04 cmayoracurzio

@marduk2 ah I didn't realise there was an almost duplicate PR in #62. I haven't heard from @PablocFonseca

lukedyer-peak avatar May 27 '22 12:05 lukedyer-peak

Hey,
I was looking for some speed-ups over here, and tried to apply your fix.
I see the wait time for showing the dataframe has come down :partying_face: to 8secs from 15-20secs that I originally had. (shape: ~6000x30)

However, now I ran into a new issue. My json/dict based column is not rendered correctly out of the box. Before the fix: image

After the fix: (Notice attributes column) image

Any clue what we could do so that this speed-up doesn't break default rendering for json kind of columns?

smartm13 avatar Jun 30 '22 10:06 smartm13

I've been busy over the past months, with little time to work on ag-grid. I remember I had to use a custom serialization bc of an issue on how tz-aware timestamps where serialized. I'll check this again and update on the next version.

PablocFonseca avatar Jul 05 '22 20:07 PablocFonseca

Just checked. The issue is that pandas convert tz-aware to UTC when serializing to json, it is a known issue https://github.com/pandas-dev/pandas/pull/46730. I did that first implementation as a work around, I just changed it now using a new method: _cast_tz_aware_date_columns_to_iso8601. Below are the initial results I got when serializing a 1_000_000 x 8 df on my modest computer. Captura de Tela 2022-07-05 às 20 17 30

PablocFonseca avatar Jul 05 '22 23:07 PablocFonseca

Thanks for the context. That's brilliant speedup (similar to pandas).

I'd be happy to review the PR if you'd like a second pair of eyes on it.

On Wed, 6 Jul 2022, 00:21 Pablo Fonseca, @.***> wrote:

Just checked. The issue is that pandas convert tz-aware to UTC when serializing to json, it is a known issue pandas-dev/pandas#46730 https://github.com/pandas-dev/pandas/pull/46730. I did that first implementation as a work around, I just changed it now using a new method: _cast_tz_aware_date_columns_to_iso8601. Below are the initial results I got when serializing a 1_000_000 x 8 df on my modest computer. [image: Captura de Tela 2022-07-05 às 20 17 30] https://user-images.githubusercontent.com/17606067/177432578-d16c7139-b557-4c2c-abf4-bd98351f5eb8.png

— Reply to this email directly, view it on GitHub https://github.com/PablocFonseca/streamlit-aggrid/pull/85#issuecomment-1175595271, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAH6OY4XY3W4D3NL3GEZJETVSS7RNANCNFSM5UBOQNGQ . You are receiving this because you were mentioned.Message ID: @.***>

luke321321 avatar Jul 06 '22 07:07 luke321321

Any updates on this?

mkleinbort-ic avatar Jan 27 '23 16:01 mkleinbort-ic