marimo icon indicating copy to clipboard operation
marimo copied to clipboard

Polars extension

Open kjgoodrick opened this issue 1 year ago • 9 comments

📝 Summary

Fixes issue #3355 by adding LazyFrame extension that allows output of mermaid graphs for polars lazyframe query plans.

🔍 Description of Changes

In this pull request I have added a polars extension that adds a mo namespace to LazyFrames, this namespace includes one "public" method show_graph that accepts the same arguments as polars LazyFrame show_graph and returns an Html object containing the mermaid chart.

Issue #3355 shows side by side outputs.

Before merging it would be necessary to add documentation for this new polars namespace, but I wanted to request your feedback before I took that step.

📋 Checklist

  • [X] I have read the contributor guidelines.
  • [ ] For large changes, or changes that affect the public API: this change was discussed or approved through an issue, on Discord, or the community discussions (Please provide a link if applicable).
  • [X] I have added tests for the changes made.
  • [X] I have run the code and verified that it works as expected.

📜 Reviewers

@akshayka

kjgoodrick avatar Jan 07 '25 00:01 kjgoodrick

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
marimo-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jan 7, 2025 0:35am
marimo-storybook ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jan 7, 2025 0:35am

vercel[bot] avatar Jan 07 '25 00:01 vercel[bot]

@MarcoGorelli , if you don't mind, I would appreciate hearing your thoughts on whether implementing this upstream in Polars would be possible, or if we should plan to monkey-patch Polars in marimo instead.

if the solution isn't too complex on the Polars side, then I think this would be welcome!

MarcoGorelli avatar Jan 07 '25 07:01 MarcoGorelli

I don't think we need to monkey-patch anything. showing mermaid should be the default (easier to navigate, adds no deps and not lossy) and instead we can add a formatted for LazyFrame in https://github.com/marimo-team/marimo/blob/main/marimo/_output/formatters/df_formatters.py

If anything could be upstreamed, we could add _repr_mermaid_ to polars, but _repr_mermaid_ is not an official repr from jupyter (however, we can support it)

mscolnick avatar Jan 07 '25 16:01 mscolnick

I don't think we need to monkey-patch anything. showing mermaid should be the default (easier to navigate, adds no deps and not lossy) and instead we can add a formatted for LazyFrame in https://github.com/marimo-team/marimo/blob/main/marimo/_output/formatters/df_formatters.py

If anything could be upstreamed, we could add _repr_mermaid_ to polars, but _repr_mermaid_ is not an official repr from jupyter (however, we can support it)

Lazyframes already render fine in marimo according to #3355.

I believe the core issue is modifying Lazyframe.show_graph to show a mermaid diagram. This cannot be accomplished using a formatter in marimo without monkey-patching Lazyframe.show_graph, unless I am misunderstanding something. So it might be preferable to upstream marimo-aware rendering for this function if Polars team is okay with that.

akshayka avatar Jan 07 '25 17:01 akshayka

I was looking at this:

image

Which is possible to move to mermaid since marimo can change the formatting of the last line. You would still run the issue of explicitly calling show_graph(optimized=True), which then would be nice to upstream show_graph(format="mermaid")

But yea @akshayka you are right, without anything upstreamed w.r.t mermaid, we would need to monkeypatch to cover all cases.

mscolnick avatar Jan 07 '25 17:01 mscolnick

Looks like there was some appetite for mermaid formatting in polars: https://github.com/pola-rs/polars/issues/12075

That part seems non-controversial, but could go as far as making the default 'mermaid' instead of 'dot' in marimo

mscolnick avatar Jan 07 '25 17:01 mscolnick

Thank you all for the discussion on this. It sounds like there is already interest on the polars side for introducing support for mermaid, which is great to here. If polars does support mermaid and is ok with us inserting a simple snippet for detecting marimo then I think we could forgo the polars extension which would remove the need to add any imports to the marimo init file.

I think then that the best way forward would be to:

  1. Add Mermaid support to polars show_graph
  2. Add a dataframe formatter to Marimo that will change the behavior of leaving a LazyFrame on the last line
    • Use polars mermaid generation developed in 1
  3. Add support to polars show_graph for detecting marimo and calling the display functionality developed in 2.

If that sounds good to you all, I can close this PR and then make a new one to handle step 2 above.

kjgoodrick avatar Jan 07 '25 19:01 kjgoodrick

That sounds good to me, thank you!

Would you like to work with Polars to lead 1 and 3 as well? If you opened an issue or PR with them, please feel free to tag me.

akshayka avatar Jan 07 '25 19:01 akshayka

Yes, I will open a PR in Polars for 1 to get started and tag you in it.

kjgoodrick avatar Jan 07 '25 19:01 kjgoodrick

Closing in favor of https://github.com/marimo-team/marimo/pull/5247

mscolnick avatar Jul 09 '25 14:07 mscolnick