hvplot by argument not working for hist plot

by argument not working for hist plot

Open MarcSkovMadsen opened this issue 1 year ago • 3 comments

I'm working on updating the docstrings for the hist plot. I would expect the by argument to work similarly to how it works for other hvPlot plots and/ or similarly to how it works for Pandas plot https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.plot.hist.html.

But it seems to have no effect.

With `by`

import hvplot.pandas # noqa
import pandas as pd
import numpy as np
age_list = [8, 10, 12, 14, 72, 74, 76, 78, 20, 25, 30, 35, 60, 85]
df = pd.DataFrame({"gender": list("MMMMMMMMFFFFFF"), "age": age_list})
df.hvplot.hist(y=["age"], by="gender")

Without `by`

import hvplot.pandas # noqa
import pandas as pd
import numpy as np
age_list = [8, 10, 12, 14, 72, 74, 76, 78, 20, 25, 30, 35, 60, 85]
df = pd.DataFrame({"gender": list("MMMMMMMMFFFFFF"), "age": age_list})
df.hvplot.hist(y=["age"])

Aug 26 '22 13:08 MarcSkovMadsen

Looking at the code here:

https://github.com/holoviz/hvplot/blob/17ce0cc18a0393ae82d3a9f6f11b3bc4cad29b30/hvplot/converter.py#L1623-L1637

It seems to not be supported when y is a list or tuple, so making by making ´y´ a string "fixes" it.

A small note is there is no indication of overlap:

With alpha on:

Aug 26 '22 15:08 hoxbro

Thanks @Hoxbro . I'm trying to move more into hvplot and understand the intention.

Is it fair for me to expect it to work in this case?
Is it a bug? Should it be fixed?
Can I expect .hvplot to be a drop in replacement for Pandas .plot? Or is it "just" something with a similar api, where I should expect to have to adjust some of my code if I migrate from .plot to .hvplot?. What is the vision?

Aug 27 '22 08:08 MarcSkovMadsen

I think it is fair to work in this case or raise an error that by does not work when y is a list/tuple.
See the answer above.
In theory and in general, I would say yes. But in practice, it is not always possible. The pandas.plot API is a moving target and could change. Furthermore, consistency between different plot kinds could overrule how pandas do it. hvplot also gives more default values than pandas do.

For this example, it is harder to get the desired output. But try to make a scatter plot with your DataFrame with the index on the x-axis and age on the y-axis. With hvplot, it is as easy as df.hvplot.scatter(y="age"). It is a bit harder to get it working with pandas.plot I could get it to work with df.reset_index().plot.scatter(x="index", y="age").

If we then want to color the scatter based on gender, we can do it with: df.hvplot.scatter(y="age", c="gender") or df.hvplot.scatter(y="age", by="gender"). I couldn't find an easy way to do this with pandas.plot (though there properly is a way to do it).

For the original question you can get the same output as pandas by using subplots: df.hvplot.hist(y="age", by="gender", subplots=True).cols(1).

Aug 28 '22 08:08 hoxbro

hvplot hvplot copied to clipboard

by argument not working for hist plot

With by

Without by

hvplot
hvplot copied to clipboard

With `by`

Without `by`