itables icon indicating copy to clipboard operation
itables copied to clipboard

`show(df)` does not work with `modin.pandas`

Open wpritom opened this issue 1 year ago • 10 comments

show() is not working while I'm importing pandas with from modin. I'm using modin to improve pandas performance.

import modin.pandas as pd

df = pd.read_csv("****.csv")

Now show(df, classes="display") column showing the following error.

AttributeError: 'DataFrame' object has no attribute 'iter_rows'

wpritom avatar Oct 06 '24 12:10 wpritom

Hi @wpritom , thanks for reporting this! Yes that's right currently ITables only supports Pandas and Polars DataFrames.

Can you convert df back to a Pandas DataFrame before calling show, for now at least?

You can leave this issue open so that I look to add support for Modin DataFrames when time permits. Thanks.

mwouts avatar Oct 06 '24 19:10 mwouts

Hi @mwouts - would you be open to using Narwhals in ITables?

I think this could simplify some of the code, e.g. this, and would also give you support for pandas / Polars / Modin / cuDF / PyArrow (and any other Narwhals-compatible eager dataframe), without making any of them required dependencies

Happy to make a PR if you'd be interested, just gauging interest first

MarcoGorelli avatar Nov 12 '24 09:11 MarcoGorelli

Hey @MarcoGorelli , Narwhals sounds like a great package indeed! And sure I would love to provide support for more dataframe types, see for instance #217 (pending) where I started working on Ibis support.

I would love to see how that part of the code would look like with Narwhals! The parts that we would need to rewrite that I am currently thinking of (there might be more) are

  • the downsampling part (estimate the size of the table content, then keep only a certain number of top and bottom rows, first and last columns)
  • the conversion from Python data to Javascript data.

Looking forward to hearing more from you!

mwouts avatar Nov 12 '24 12:11 mwouts

Hi @mwouts, I'm working on this. Just so you know that we are around. (I'm a Narwhals team member, 🙂).

DeaMariaLeon avatar Nov 15 '24 09:11 DeaMariaLeon

Hi @wpritom , we're getting something that is starting to work - huge thanks to @DeaMariaLeon and to @MarcoGorelli !

Can you give a try at this PR and let us know how it works for you?

pip install git+https://github.com/mwouts/itables.git@use_narwhals

Also I am not familiar with modin, so I am wondering if it is expected that the modin tests are much slower than the pandas ones?

Last but not least I see warnings on my empty dataframes in the sample dataframe notebook (docs/modin_dataframes.md), I guess they come from modin itself?

UserWarning: `DataFrame.memory_usage` for empty DataFrame is not currently supported by PandasOnDask, defaulting to pandas implementation.
Please refer to https://modin.readthedocs.io/en/stable/supported_apis/defaulting_to_pandas.html for explanation.
UserWarning: `DataFrame.memory_usage` for empty DataFrame is not currently supported by PandasOnDask, defaulting to pandas implementation.

UserWarning: `DataFrame.itertuples` for empty DataFrame is not currently supported by Pandas

mwouts avatar Dec 15 '24 22:12 mwouts

To follow-up on this, we have a PR that passes the tests, however I see significant performance issues and hence I am not confident releasing it.

This code takes 13 to 18 seconds to run on my computer and I see no reason why it should be so slow - the dataframe is only 100 columns x 100 rows. And the to_html_datatable call takes 4 seconds which sounds far too much too... Is it expected that Modin has this kind of performance problems or could there be an issue with my local installation?

from itables.sample_dfs import get_dict_of_test_modin_dfs
from itables.javascript import to_html_datatable

df = get_dict_of_test_modin_dfs()["wide"]
html = to_html_datatable(df)

mwouts avatar Jan 11 '25 22:01 mwouts

Hey - I've also observed Modin having a tonne of overhead, I think it's only intended for datasets that don't fit on a single machine. To be honest I'm not sure that Modin even is a great candidate for iTables, Modin users might be better off converting to pandas before passing their table to itables

For Polars users, on the other hand, I'd expect iTables to work very well, as Polars works well on small datasets

MarcoGorelli avatar Jan 12 '25 07:01 MarcoGorelli

Hi @mwouts, I wonder if the possibility of using Narwhals is still on the table. 🙂

DeaMariaLeon avatar Feb 15 '25 08:02 DeaMariaLeon

Hi @mwouts, I wonder if the possibility of using Narwhals is still on the table. 🙂

Hello @DeaMariaLeon , yes sure!

As mentioned above the problem that I have with the PR is actually with Modin, not with Narwhals which works great! And I am still waiting for a feedback on the PR (cf. the instructions above) from an actual Modin user - I got none so far so I am tempted to infer that there is not a significant demand for Modin support in ITables.

Alternatively I'd be happy to take Narwhals if we can provide support for another type of dataframes - I remember giving a try to Ibis around that time.

mwouts avatar Feb 15 '25 17:02 mwouts

Thanks for your prompt reply @mwouts.. We are not sure that Modin is still alive, actually. Let me know please if I can help.

DeaMariaLeon avatar Feb 17 '25 13:02 DeaMariaLeon