altair
altair copied to clipboard
Two histograms in same plot, side-by-side bars
I want to create an Altair histogram with side-by-side bars like this (except I want % per class, not count, on the Y axis):
Am I correct in that this is impossible to do with the current API? The closest I have been able to get is this:
import pandas as pd
import numpy as np
import altair as alt
# Generate some example data
true_scores = np.random.normal(1, 2, 1000)
false_scores = np.random.normal(-2, 3, 1000)
scores = np.concatenate((true_scores, false_scores))
labels = [True] * 1000 + [False] * 1000
df = pd.DataFrame({"score": scores, "label": labels})
# make plot
alt.Chart(df).transform_joinaggregate(
total='count(*)',
groupby=['label']
).transform_calculate(
pct='1 / datum.total'
).mark_bar().encode(
x=alt.X("label:N", axis=alt.Axis(title=None, labels=False)),
y=alt.Y("sum(pct):Q", axis=alt.Axis(title="Count")),
color=alt.Color('label:N'),
column=alt.Column('score:Q', bin=alt.Bin(maxbins=100), header=alt.Header(labelOrient="bottom")),
)
which creates this plot:
I don't know how to combine all of the separate facet column plots into a single plot. I've also tried much messier approaches of binning in Pandas, but those have other issues and still don't get me what I want. I created a StackOverflow question previously, but I think this feature may just not exist.
I think using xOffset
channel can solve this, original discussion from here, another discrete category example with altair can be found here
import pandas as pd
import numpy as np
import altair as alt
# Generate some example data
true_scores = np.random.normal(1, 2, 1000)
false_scores = np.random.normal(-2, 3, 1000)
scores = np.concatenate((true_scores, false_scores))
labels = [True] * 1000 + [False] * 1000
df = pd.DataFrame({"score": scores, "label": labels})
# make plot
alt.Chart(df).transform_joinaggregate(
total="count(*)",
groupby=["label"]
).transform_calculate(
pct="1 / datum.total"
).mark_bar().encode(
x=alt.X("score:O", bin=alt.Bin(maxbins=100),axis=alt.Axis(title=None, labels=True)),
y=alt.Y("sum(pct):Q", axis=alt.Axis(title="Percentage",format='.0%')),
color=alt.Color("label:N"),
xOffset=alt.XOffset("label:N")
)
result:
I saw that and actually tried it, but that makes the X axis discontinuous. Also, I'd like the X-axis to be just normal numbers and not this binned thing.
Not sure if this is what you want, but remove the bin
encoding would work:
import pandas as pd
import numpy as np
import altair as alt
# Generate some example data
true_scores = np.random.randint(6, size=10)
false_scores = np.random.randint(6, size=10)
scores = np.concatenate((true_scores, false_scores))
labels = [True] * 10 + [False] * 10
df = pd.DataFrame({"score": scores, "label": labels})
# make plot
alt.Chart(df).transform_joinaggregate(
total="count(*)",
groupby=["label"]
).transform_calculate(
pct="1 / datum.total"
).mark_bar().encode(
x=alt.X("score:O",axis=alt.Axis(title=None, labels=True)),
y=alt.Y("sum(pct):Q", axis=alt.Axis(title="Percentage",format='.0%')),
color=alt.Color("label:N"),
xOffset=alt.XOffset("label:N")
)
Not using the original example data as random float is unsuited in this chart