marimo icon indicating copy to clipboard operation
marimo copied to clipboard

`marimo lint/tidy notebook.py`

Open dmadisetti opened this issue 1 year ago • 5 comments

Description

Discussion in https://github.com/marimo-team/marimo/discussions/1379 brings up the great point that tools like ruff will struggle with marimo since it cannot prune imports and unused variables (since all defs are returned)

Suggested solution

marimo can suggest values to prune based on the graph. It can check to see if there are any dangling defs.

some mo specific lint hints:

  • If a def is utilized in only one cell, suggest turning it private.
  • If a def is not utilized, suggest removing it.

Additional opinionated tidy options

  • Suggestions to split or merge cells based on topology (e.g. 1 liners that are sequentially dependent or very large cells)
  • Suggestions to move cells with only code and no output to bottom of notebook
  • Suggestions to group imports into cells

This can be a wrapper/ preprocessor for a tool like ruff that can then check the notebook on a per cell level.

Alternative

Use tools like ruff as is, but miss out on marimo-specific clean-up

Additional context

No response

dmadisetti avatar Jun 02 '24 22:06 dmadisetti

I've been thinking about this a bit more. Notebooks are used to:

  1. show stuff
  2. or do stuff

lint should be able to determine if all code does either of these things.

It's easy to determine if code is used to "show stuff", as the code can be traced down to outputs. Determining code that "does stuff" is a little harder, but can be made explicit. It might be worth introducing the following api:

from marimo.save import side_effect

with side_effect("Pinging site"):
    requests.get("example.com")

lint can then determine whether code blocks feed into either an output, OR a side effect. Common side effects like "open" could be aliased, where:

with side_effect("write to file"):
    with open("file", "w") as f:
        f.write("x")

is flagged as not needed.


Side effect blocks can also be leveraged for native saving, in which case the block will be forced to run, even if the rest of the cell is cached.

dmadisetti avatar Sep 14 '24 16:09 dmadisetti

Personally I'm -1 on this. I don't think Marimo should seek to reinvent every orthogonal piece of tooling -- this has always been one of the greatest weaknesses of Jupyter notebooks.

It should be possible to still achieve linting, type-checking etc. just by changing the serialize-as-.py format. For example, for the case of unused imports: just don't return them as outputs from the function. Marimo is already tracking downstream usages anyway, so this should be doable. Normal linters will then detect the unused import.

patrick-kidger avatar Sep 18 '24 13:09 patrick-kidger

Pruning returns would be easy to do, and is a great idea- certainly an elegant way of getting cells to conform to style.

I don't think having some organizational tool is a bad idea, but I think it needs to be notebook specific, and leave the python linting to the linters. Notebook specific suggestions might be:

  • detecting duplicate cells
  • detecting when state / ui changes may create unnecessary reruns
  • And maybe some hints like not using plt.show

The tooling for hints like these does not exist, and doesn't explicitly overlap with other style rules

Where there is alignment (e.g. not returning unused definitions, maybe leveraging top level imports/ consts), marimo should definitely lean into that.

dmadisetti avatar Sep 18 '24 14:09 dmadisetti

@patrick-kidger: For example, for the case of unused imports: just don't return them as outputs from the function.

I really like this idea, and maybe should be done by default.

mscolnick avatar Sep 19 '24 13:09 mscolnick

I started down this path a little bit cause i kept messing up my little POC project, my first time writing pylint plugins (ruff doesn't have support yet) so be nice :) not sure if I caught all of the use cases yet, but figured i'd share. Also great project!!! https://github.com/bossjones/prompt-library/blob/14f86f0ad9a6942e8f2f53b6afe9fa2a25ba0fb7/pylint/plugins/marimo_cell_params_validator.py

bossjones avatar Dec 12 '24 18:12 bossjones

To touch base on this, the new "library format" outlined in https://marimo.io/blog/python-not-json#reuse-functions-and-classes-in-other-python-files

will add the "dangling refs" feature. To facilitate discussion for other lint options see: https://github.com/marimo-team/marimo/discussions/4332

dmadisetti avatar Mar 31 '25 21:03 dmadisetti