wave icon indicating copy to clipboard operation
wave copied to clipboard

Handle punctuation in column names for plotting

Open mtanco opened this issue 4 years ago • 3 comments

Is your feature request related to a problem? Please describe

H2O Driverless AI and MLOps return predictions of binary and multi-class in the form of name.0, name.1, ..., name.n. UI plots fail if the x or y has a decimal in the name, without an error message (the page is just white). The user needs to replace the decimal in the name for the plot to work as expected.

Describe the solution you'd like

Automatically handle decimals in the feature names

Describe alternatives you've considered

Require all developers to handle this themselves and just add more documentation

Additional context

Example that works

from h2o_wave import site, ui, data
import pandas as pd
import numpy as np

page = site['/demo']

n = 100
df = pd.DataFrame(dict(
    length=np.random.rand(n),
    width=np.random.rand(n),
    data_type=np.random.choice(a=['Train', 'Test'], size=n, p=[0.8, 0.2])
))

# Plot two numeric columns by each other and color based on a third, categorical column
page['scatter'] = ui.plot_card(
    box='1 1 4 5',
    title='Scatter Plot from Dataframe',
    data=data(
        fields=df.columns.tolist(),
        rows=df.values.tolist(),
        pack=True,
    ),
    plot=ui.plot(marks=[ui.mark(
        type='point',
        x='=length', x_title='Length',
        y='=width', y_title='Width',
        color='=data_type', shape='circle',
    )])
)

page.save()

Add a point to fail:

from h2o_wave import site, ui, data
import pandas as pd
import numpy as np

page = site['/demo']

n = 100
df = pd.DataFrame(dict(
    length=np.random.rand(n),
    width=np.random.rand(n),
    data_type=np.random.choice(a=['Train', 'Test'], size=n, p=[0.8, 0.2])
))

df.columns = ['length', 'width', 'data.type']

# Plot two numeric columns by each other and color based on a third, categorical column
page['scatter'] = ui.plot_card(
    box='1 1 4 5',
    title='Scatter Plot from Dataframe',
    data=data(
        fields=df.columns.tolist(),
        rows=df.values.tolist(),
        pack=True,
    ),
    plot=ui.plot(marks=[ui.mark(
        type='point',
        x='=length', x_title='Length',
        y='=width', y_title='Width',
        color='=data.type', shape='circle',
    )])
)

page.save()

mtanco avatar Jan 29 '21 19:01 mtanco

@lo5 Thank you for fixing this in the color parameter! We noticed that the plot still fails with no error message if punctuation is in the column name for other parameters like x or y. Can this fix be applied to all inputs that can be parameterized by feature="=column"?

Repo of failing in the x and y:

from h2o_wave import site, ui, data
import pandas as pd
import numpy as np

page = site['/demo']

n = 100
df = pd.DataFrame(dict(
    length=np.random.rand(n),
    width=np.random.rand(n),
    data_type=np.random.choice(a=['Train', 'Test'], size=n, p=[0.8, 0.2])
))

df.columns = ['length.1', 'width.1', 'data.type']

# Plot two numeric columns by each other and color based on a third, categorical column
page['scatter'] = ui.plot_card(
    box='1 1 4 5',
    title='Scatter Plot from Dataframe',
    data=data(
        fields=df.columns.tolist(),
        rows=df.values.tolist(),
        pack=True,
    ),
    plot=ui.plot(marks=[ui.mark(
        type='point',
        x='=length.1', x_title='Length',
        y='=width.1', y_title='Width',
        color='=data.type', shape='circle',
    )])
)

page.save()

CC @mturoci

mtanco avatar Jan 25 '22 21:01 mtanco

@mturoci Hit this again today on dataset with a column called Churn?

It would be great if Plot could support all punctuation in column names, but if this is not feasible, could we please add a check and present the user with an error message?

mtanco avatar Mar 18 '22 20:03 mtanco

I guess it shouldn't be a problem for any of the options, just not sure what is the general practice - cc @lo5

mturoci avatar Mar 21 '22 07:03 mturoci