wave
wave copied to clipboard
Handle punctuation in column names for plotting
Is your feature request related to a problem? Please describe
H2O Driverless AI and MLOps return predictions of binary and multi-class in the form of name.0, name.1, ..., name.n
. UI plots fail if the x or y has a decimal in the name, without an error message (the page is just white). The user needs to replace the decimal in the name for the plot to work as expected.
Describe the solution you'd like
Automatically handle decimals in the feature names
Describe alternatives you've considered
Require all developers to handle this themselves and just add more documentation
Additional context
Example that works
from h2o_wave import site, ui, data
import pandas as pd
import numpy as np
page = site['/demo']
n = 100
df = pd.DataFrame(dict(
length=np.random.rand(n),
width=np.random.rand(n),
data_type=np.random.choice(a=['Train', 'Test'], size=n, p=[0.8, 0.2])
))
# Plot two numeric columns by each other and color based on a third, categorical column
page['scatter'] = ui.plot_card(
box='1 1 4 5',
title='Scatter Plot from Dataframe',
data=data(
fields=df.columns.tolist(),
rows=df.values.tolist(),
pack=True,
),
plot=ui.plot(marks=[ui.mark(
type='point',
x='=length', x_title='Length',
y='=width', y_title='Width',
color='=data_type', shape='circle',
)])
)
page.save()
Add a point to fail:
from h2o_wave import site, ui, data
import pandas as pd
import numpy as np
page = site['/demo']
n = 100
df = pd.DataFrame(dict(
length=np.random.rand(n),
width=np.random.rand(n),
data_type=np.random.choice(a=['Train', 'Test'], size=n, p=[0.8, 0.2])
))
df.columns = ['length', 'width', 'data.type']
# Plot two numeric columns by each other and color based on a third, categorical column
page['scatter'] = ui.plot_card(
box='1 1 4 5',
title='Scatter Plot from Dataframe',
data=data(
fields=df.columns.tolist(),
rows=df.values.tolist(),
pack=True,
),
plot=ui.plot(marks=[ui.mark(
type='point',
x='=length', x_title='Length',
y='=width', y_title='Width',
color='=data.type', shape='circle',
)])
)
page.save()
@lo5 Thank you for fixing this in the color
parameter! We noticed that the plot still fails with no error message if punctuation is in the column name for other parameters like x
or y
. Can this fix be applied to all inputs that can be parameterized by feature="=column"
?
Repo of failing in the x
and y
:
from h2o_wave import site, ui, data
import pandas as pd
import numpy as np
page = site['/demo']
n = 100
df = pd.DataFrame(dict(
length=np.random.rand(n),
width=np.random.rand(n),
data_type=np.random.choice(a=['Train', 'Test'], size=n, p=[0.8, 0.2])
))
df.columns = ['length.1', 'width.1', 'data.type']
# Plot two numeric columns by each other and color based on a third, categorical column
page['scatter'] = ui.plot_card(
box='1 1 4 5',
title='Scatter Plot from Dataframe',
data=data(
fields=df.columns.tolist(),
rows=df.values.tolist(),
pack=True,
),
plot=ui.plot(marks=[ui.mark(
type='point',
x='=length.1', x_title='Length',
y='=width.1', y_title='Width',
color='=data.type', shape='circle',
)])
)
page.save()
CC @mturoci
@mturoci Hit this again today on dataset with a column called Churn?
It would be great if Plot could support all punctuation in column names, but if this is not feasible, could we please add a check and present the user with an error message?
I guess it shouldn't be a problem for any of the options, just not sure what is the general practice - cc @lo5