pygwalker icon indicating copy to clipboard operation
pygwalker copied to clipboard

Pygwalker cannot render too much data

Open heqi201255 opened this issue 9 months ago • 1 comments

I was trying to plot my data using Pygwalker, the data is a csv file about 467MB with shape (3682080, 12), my code is like:

from pygwalker.api.streamlit import StreamlitRenderer
import pandas as pd
import streamlit as st

# Adjust the width of the Streamlit page
st.set_page_config(
    page_title="Use Pygwalker In Streamlit",
    layout="wide"
)

# Add Title
st.title("Use Pygwalker In Streamlit")

# You should cache your pygwalker renderer, if you don't want your memory to explode
@st.cache_resource
def get_pyg_renderer() -> "StreamlitRenderer":
    df = pd.read_csv("/data.csv")
    # If you want to use feature of saving chart config, set `spec_io_mode="rw"`
    return StreamlitRenderer(df, kernel_computation=True)


renderer = get_pyg_renderer()

renderer.explorer()

I tried to use pygwalker inside jupyter and via streamlit, both gave me the error "The query returned too many data entries, making it difficult for the frontend to render. Please adjust your chart configuration and try again."

Screenshot: Screenshot 2024-05-13 at 14 56 56

The visualization is stuck at loading, and got a timeout message afterwards. Is there any workaround to render my data? What chart configuration should I adjust?

heqi201255 avatar May 13 '24 07:05 heqi201255

Hi @heqi201255

Thank you for bringing up this issue with pygwalker. By default, pygwalker has a fixed limitation on data queries to ensure the safety of memory usage in the frontend browser.

When the count(distinct t) exceeds 1,000,000 (1 million), it becomes challenging for the frontend to efficiently render such a large amount of data into a chart.

To address this issue, we are considering adding a new parameter that allows users to control the maximum data size for rendering. This parameter will provide flexibility and allow users to adjust the size according to their specific needs.

One possible solution is to introduce the following code snippet, which sets the maximum data length to 10,000,000 (10 million):

pyg.GlobalVarManager.set_max_data_length(10 * 1000 * 1000)

We would appreciate your thoughts and feedback on this proposed solution. Please let us know if you have any suggestions or concerns.

longxiaofei avatar May 13 '24 07:05 longxiaofei