Polars LazyFrame `show_graph` has poor graphic quality in Marimo

Open kjgoodrick opened this issue 1 year ago • 2 comments

Description

I am submitting a PR for this suggestion.

When using polars LazyFrames it is often desirable to display the query plan prior to collecting the query. There are multiple ways to do this in Marimo.

Leave the LazyFrame object as the last line in a cell.
- This works well and displays a nice SVG image in the cell (as long as graphviz is installed and on the path).
Manually call the show_graph function
- This is necessary if one wants to show the optimized version of the query plan
- Unfortunately, this produces a very poor output if matplotlib is installed or an error if it is not
Use explain to output a text-based query plan
- This works well and Marimo even formats the text (whereas Jupyter notebooks by default show the raw string)
- However, it can be harder to follow than the graph for more complicated queries.

It would be nice if Marimo could display high quality graphics for the query plan even when the user wants to view the optimized plan and/or doesn't have matplotlib / graphviz installed. It would also be nice if the graph followed the theme of the notebook (dark / light) and could nicely display plans that have many terms (e.g. a query that transforms many columns in one step).

Suggested solution

My suggestion is to register a polars extension that adds a marimo mo namespace to LazyFrames and allows users to display high quality query plan graphs in all situations with only a slight change to their code. This added polars extension code will be maintained within the marimo repository so that it will not require adding any code to the polars repository.

Because marimo already has support for displaying mermaid graphs and polars can return the raw text defining the graph (in dot notation) it makes sense to parse this and convert to mermaid.

Once added this approach would allow for results that:

Are natively high quality for all means of showing the graph
Wrap the text in long columns
Do not require graphviz or matplotlib to be installed
Adopt to the user theme by default

Case	Current	Proposed	Notes
Show Graph
Light Mode Example
LazyFrame last line			No Change here, would likely require a change to polars unless marimo has a way to change the behavior of objects as the last line.
Join Graph			Current behavior sometimes does not fit on the screen if the width is not "right"
Wide example			Proposed shown with output expanded

Alternative

Instead of outputting meramaid code it might be possible to recreate polars _repr_html_ in order to display a high-quality image of the graph. However, this would require the user to have dot installed, would not change color with the user theme, and would not give the line wrapping behavior for wide graphs.

Additional context

I think ultimately the best solution would be to have show_graph and _repr_html_ in polars recognize that they are in marimo and change their behavior. This would require their cooperation though, which is of unknown likelihood to me. I have written the implementation for the PR such that it would be easy for them to call marimo's functions if they detect a marimo environment (similar to what they already do for notebooks).

This snippet could be added in to the polars display_dot_graph function after the raw_output check to get the same behavior when using polars' show_graph function.

try:
    from marimo import running_in_notebook
    from marimo._polars.lazyframe import Marimo 

    if running_in_notebook():
        return Marimo._dot_to_mermaid_html(dot)
except ImportError:
    pass

Jan 07 '25 00:01 kjgoodrick

Hey @kjgoodrick this seems like a great idea/addition. You can actually easily add that logic to https://github.com/marimo-team/marimo/blob/main/marimo/_output/formatters/df_formatters.py

You can see what we do for dataframes (just call table()) and just call mermaid() for the lazyframe

Jan 07 '25 00:01 mscolnick