marimo icon indicating copy to clipboard operation
marimo copied to clipboard

Progress bar doesn't work well with objects that have no len()

Open mrshu opened this issue 1 year ago • 1 comments

Describe the bug

First of all, thanks a bunch for marimo -- it's my favourite tool I ran across in 2024!

One issue I came across was finding a solid tqdm equivalent. The [mo.status.progress_bar](https://docs.marimo.io/api/status.html) does the job pretty well but it sadly doesn't really work with objects that have no len() such as for instance generators, which are used quite often, such as when iterating over df.iterrows().

One option of fixing this would be to provide an option to provide the total to the mo.status.progress_bar call instead of computing it (which inevitably leads to TypeError: object of type 'generator' has no len() ): https://github.com/marimo-team/marimo/blob/8a18849f60add7b51065c87c103af1aae8ff7487/marimo/_plugins/stateless/status/_progress.py#L274-L279

This is also how tqdm handles this (see for instance https://github.com/softhints/Pandas-Tutorials/blob/master/tqdm/1.progress-bars-pandas-python-tqdm.ipynb).

Let me know if this would make sense -- I'd be happy to try out submitting a PR with the change.

Environment

{
  "marimo": "0.1.76",
  "OS": "Darwin",
  "OS Version": "23.2.0",
  "Processor": "arm",
  "Python Version": "3.11.4",
  "Binaries": {
    "Chrome": "120.0.6099.216",
    "Node": "v20.5.0"
  },
  "Requirements": {
    "black": "23.12.1",
    "click": "8.1.7",
    "jedi": "0.19.1",
    "pymdown-extensions": "10.7",
    "tomlkit": "0.12.3",
    "tornado": "6.4",
    "typing_extensions": "4.9.0"
  }
}

Code to reproduce

import marimo
 
__generated_with = "0.1.76"
app = marimo.App()
 
 
@app.cell
def __():
    import pandas as pd
    import time
    import marimo as mo
    from tqdm import tqdm
    return mo, pd, time, tqdm
 
 
@app.cell
def __(mo, time):
    for x in mo.status.progress_bar(range(5)):
        print(x)
        time.sleep(x)
    return x,
 
 
@app.cell
def __(pd):
    # Example 1D list
    data = [10, 20, 30, 40, 50]
 
    # Define column name
    column = 'Value'
 
    # Create a DataFrame
    df = pd.DataFrame(data, columns=[column])
 
    df
    return column, data, df
 
 
@app.cell
def __(df, time, tqdm):
    for y, _row in tqdm(df.iterrows()):
        print(y)
        time.sleep(y)
    return y,
 
 
@app.cell
def __(df, mo, time):
    for i, _row in mo.status.progress_bar(df.iterrows()):
        print(i)
        time.sleep(i)
    return i,
 
 
if __name__ == "__main__":
    app.run()
 

mrshu avatar Jan 15 '24 16:01 mrshu

Hey @mrshu! Thanks for the thorough issue report. Adding an optional argument, total, is a great idea. Please do make a PR! And let us know if you need any help.

akshayka avatar Jan 15 '24 17:01 akshayka