marimo icon indicating copy to clipboard operation
marimo copied to clipboard

dataframe marimo view: if I apply a filter, I have zero records, but I should have a lot of records

Open aborruso opened this issue 8 months ago • 10 comments

Describe the bug

Hi, I apply a filter by value in a dataframe. After I apply it I have zero records in the view, but I should have many.

If it is due to the fact that there are so many records and the filters do not work well in such cases, shouldn't the user be warned?

Image

Environment

{
  "marimo": "0.13.2",
  "OS": "Linux",
  "OS Version": "5.15.167.4-microsoft-standard-WSL2",
  "Processor": "",
  "Python Version": "3.11.2",
  "Binaries": {
    "Browser": "121.0.6167.184",
    "Node": "v20.17.0"
  },
  "Dependencies": {
    "click": "8.1.8",
    "docutils": "0.21.2",
    "itsdangerous": "2.2.0",
    "jedi": "0.19.2",
    "markdown": "3.8",
    "narwhals": "1.36.0",
    "packaging": "25.0",
    "psutil": "7.0.0",
    "pygments": "2.19.1",
    "pymdown-extensions": "10.14.3",
    "pyyaml": "6.0.2",
    "starlette": "0.46.2",
    "tomlkit": "0.13.2",
    "typing-extensions": "4.13.2",
    "uvicorn": "0.34.2",
    "websockets": "15.0.1"
  },
  "Optional Dependencies": {
    "duckdb": "1.2.2",
    "pandas": "2.2.3",
    "polars": "1.27.1",
    "pyarrow": "19.0.1",
    "pycrdt": "0.11.1",
    "ruff": "0.11.7",
    "sqlglot": "26.16.2"
  },
  "Experimental Flags": {}
}

Code to reproduce

# csv source https://www.italiadomani.gov.it/content/dam/sogei-ng/opendata/PNRR_Progetti.csv

import marimo

__generated_with = "0.13.2"
app = marimo.App(width="medium")


@app.cell
def _():
    import polars as pl
    import marimo as mo
    return mo, pl


@app.cell
def _(pl):
    df = pl.read_csv(
        "PNRR_Progetti_01.csv",
        separator=";",
        has_header=True,
        infer_schema_length=100_000,
        null_values=["N/A"],
        decimal_comma=True
    )
    return (df,)


@app.cell
def _(df, pl):
    date_columns = [col for col in df.columns if col.startswith("Data")]

    df_updated = df.clone()  # Create a copy to avoid modifying the original DataFrame

    for col in date_columns:
        df_updated = df_updated.with_columns(pl.col(col).str.strptime(pl.Date, "%d/%m/%Y").alias(col))
    return (df_updated,)


@app.cell
def _(df_updated, mo):
    mo.ui.table(df_updated, max_columns=None)
    return


if __name__ == "__main__":
    app.run()

aborruso avatar Apr 25 '25 08:04 aborruso

Strange, I cannot reproduce it 🤔 Image

Light2Dark avatar Apr 25 '25 09:04 Light2Dark

Thank you @Light2Dark . I don't know how to do more debugging though. It always happens to me with this dataframe :(

I take the opportunity to ask you a question: I see that in your dataframe, for example, a histogram does not appear at the "Mission" field. And it happens to me as well. Why are they not generated? Does this occur for all large dataframes?

aborruso avatar Apr 25 '25 09:04 aborruso

I see, I can test with more datasets too, but there aren't restrictions on the size for this filter. It works with other dataframes?

Does this occur for all large dataframes?

Yes, the column charts are not done for large dfs. There is a related issue https://github.com/marimo-team/marimo/issues/3104. If we solve this, I think we could increase the limit. Relevant code path

Light2Dark avatar Apr 25 '25 12:04 Light2Dark

I also cannot reproduce this either (and tried to match your same version of polars/narwhals).

I do see the rows being updated correctly, though. Just the results are empty.

@aborruso can you look at the network request and see if you see results there?

mscolnick avatar Apr 25 '25 14:04 mscolnick

Hi @mscolnick if I use this subset, it works :(

Image

aborruso avatar Apr 25 '25 14:04 aborruso

Yes, the column charts are not done for large dfs. There is a related issue #3104. If we solve this, I think we could increase the limit. Relevant code path

Hi, I'm using it with a small dataframe and I have no chart

Image

aborruso avatar Apr 25 '25 14:04 aborruso

I also cannot reproduce this either (and tried to match your same version of polars/narwhals).

I do see the rows being updated correctly, though. Just the results are empty.

@aborruso can you look at the network request and see if you see results there?

I have these in the console

Image

aborruso avatar Apr 25 '25 14:04 aborruso

Hi, I'm using it with a small dataframe and I have no chart

there is a column limit of 40, row_size of 20k for charts.

you can check the network tab and may see a request ending with .json when you apply the filter

Image

Light2Dark avatar Apr 25 '25 15:04 Light2Dark

Hi, I am very sorry. I restarted the machine, without changing anything and now everything works. I feel like screaming and I apologize for the time I have wasted.

aborruso avatar Apr 25 '25 15:04 aborruso

@Light2Dark @mscolnick I'm reopening the issue, because I realized why I didn't have this problem before and then I didn't have it again.

In my notebook it does not work if I use mo.ui.table(df, max_columns=None)

Image

It works if I use simply df

Image

aborruso avatar Apr 25 '25 21:04 aborruso

@aborruso are you still experiencing this?

mscolnick avatar May 14 '25 21:05 mscolnick

Tomorrow I will test.

Thank you very much

aborruso avatar May 14 '25 21:05 aborruso

Hi @mscolnick : yes, it's the same and I have used 0.13.9

Thank you

Image

# csv source https://www.italiadomani.gov.it/content/dam/sogei-ng/opendata/PNRR_Progetti.csv
import marimo

__generated_with = "0.13.8"
app = marimo.App(width="medium")


@app.cell
def _():
    import polars as pl
    import marimo as mo
    return mo, pl


@app.cell
def _(pl):
    df = pl.read_csv(
        "PNRR_Progetti_01.csv",
        separator=";",
        has_header=True,
        infer_schema_length=100_000,
        null_values=["N/A"],
        decimal_comma=True
    )
    return (df,)


@app.cell
def _(df, pl):
    date_columns = [col for col in df.columns if col.startswith("Data")]

    df_updated = df.clone()  # Create a copy to avoid modifying the original DataFrame

    for col in date_columns:
        df_updated = df_updated.with_columns(pl.col(col).str.strptime(pl.Date, "%d/%m/%Y").alias(col))
    return (df_updated,)


@app.cell
def _(df_updated, mo):
    mo.ui.table(df_updated, max_columns=None)
    return


if __name__ == "__main__":
    app.run()

aborruso avatar May 15 '25 06:05 aborruso