hvplot
hvplot copied to clipboard
by argument not working for hist plot
I'm working on updating the docstrings for the hist
plot. I would expect the by
argument to work similarly to how it works for other hvPlot plots and/ or similarly to how it works for Pandas plot https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.plot.hist.html.
But it seems to have no effect.
With by
import hvplot.pandas # noqa
import pandas as pd
import numpy as np
age_list = [8, 10, 12, 14, 72, 74, 76, 78, 20, 25, 30, 35, 60, 85]
df = pd.DataFrame({"gender": list("MMMMMMMMFFFFFF"), "age": age_list})
df.hvplot.hist(y=["age"], by="gender")
Without by
import hvplot.pandas # noqa
import pandas as pd
import numpy as np
age_list = [8, 10, 12, 14, 72, 74, 76, 78, 20, 25, 30, 35, 60, 85]
df = pd.DataFrame({"gender": list("MMMMMMMMFFFFFF"), "age": age_list})
df.hvplot.hist(y=["age"])
Looking at the code here:
https://github.com/holoviz/hvplot/blob/17ce0cc18a0393ae82d3a9f6f11b3bc4cad29b30/hvplot/converter.py#L1623-L1637
It seems to not be supported when y is a list or tuple, so making by making ´y´ a string "fixes" it.
A small note is there is no indication of overlap:
With alpha on:
Thanks @Hoxbro . I'm trying to move more into hvplot and understand the intention.
- Is it fair for me to expect it to work in this case?
- Is it a bug? Should it be fixed?
- Can I expect
.hvplot
to be a drop in replacement for Pandas.plot
? Or is it "just" something with a similar api, where I should expect to have to adjust some of my code if I migrate from.plot
to.hvplot
?. What is the vision?
- I think it is fair to work in this case or raise an error that
by
does not work wheny
is a list/tuple. - See the answer above.
- In theory and in general, I would say yes. But in practice, it is not always possible. The pandas.plot API is a moving target and could change. Furthermore, consistency between different plot kinds could overrule how pandas do it. hvplot also gives more default values than pandas do.
For this example, it is harder to get the desired output. But try to make a scatter plot with your DataFrame with the index on the x-axis and age on the y-axis. With hvplot, it is as easy as df.hvplot.scatter(y="age")
. It is a bit harder to get it working with pandas.plot
I could get it to work with df.reset_index().plot.scatter(x="index", y="age")
.
If we then want to color the scatter based on gender, we can do it with: df.hvplot.scatter(y="age", c="gender")
or df.hvplot.scatter(y="age", by="gender")
. I couldn't find an easy way to do this with pandas.plot
(though there properly is a way to do it).
For the original question you can get the same output as pandas by using subplots: df.hvplot.hist(y="age", by="gender", subplots=True).cols(1)
.