hvplot icon indicating copy to clipboard operation
hvplot copied to clipboard

Spurious lines on violin plot

Open yt87 opened this issue 1 year ago • 5 comments

Thanks for contacting us! Please read and follow these instructions carefully, then delete this introductory text to keep your issue easy to read. Note that the issue tracker is NOT the place for usage questions and technical assistance; post those at Discourse instead. Issues without the required information below may be closed immediately.

ALL software version info

python                    3.11.8          hab00c5b_0_cpython    conda-forge
hvplot                    0.9.2              pyhd8ed1ab_0    conda-forge
holoviews                 1.18.3             pyhd8ed1ab_0    conda-forge
bokeh                     3.3.4              pyhd8ed1ab_0    conda-forge

Description of expected behavior and the observed behavior

The violin plot occasionally produces spurious lines. The same data plotted with box is OK.

Complete, minimal, self-contained example code that reproduces the issue

Data file: bar.csv

This is a stripped down large data file, with the minimal number of items I was able to reproduce this behaviour.

# code goes here between backticks
import pandas as pd
import hvplot.pandas
df = pd.read_csv('/tmp/bar.csv')
df.hvplot.violin(by='month')
df.hvplot.box(by='month')

Stack traceback and/or browser JavaScript console output

Screenshots or screencasts of the bug in action

violin_plot box_plot

  • [ ] I may be interested in making a pull request to address this

yt87 avatar Mar 30 '24 20:03 yt87

This seems to happen because Jun only has 0 as the a-value.

hoxbro avatar Mar 31 '24 09:03 hoxbro

I doubt this is caused by only 0 data values. Here is a plot with full dataset. real_plot There are several months with no ice, but no spurious lines show up. Also, the plot in my previous message looks fine when done with the matplotlib extension. Which suggest the bug might be in holoviews/plotting/bokeh/stats.py

yt87 avatar Mar 31 '24 19:03 yt87

You are right. If I reduce your data to the following, I get the line.

import pandas as pd
import hvplot.pandas
import holoviews as hv
from io import StringIO

data = """
time,A,month
2009-06-01,0.0,Jun
2009-07-01,0.88,Jul
2009-07-02,0.96,Jul
2009-07-03,1.0,Jul
2009-07-04,0.95,Jul
2009-07-05,0.93,Jul
2008-07-01,0.47,Jul
2008-07-02,0.94,Jul
2008-07-03,0.93,Jul
2008-07-04,0.89,Jul
2008-07-05,0.95,Jul
2008-07-06,0.94,Jul
2008-07-07,0.91,Jul
2008-07-08,0.96,Jul
2008-07-09,0.90,Jul
2008-07-10,0.96,Jul
2008-07-11,0.9,Jul
2008-07-12,0.96,Jul
2008-07-13,0.94,Jul
2008-07-14,0.9,Jul
2008-07-15,0.95,Jul
2008-07-16,0.94,Jul
2008-07-17,0.89,Jul
2008-07-18,0.89,Jul
2008-07-19,0.87,Jul
2008-07-20,0.9,Jul
2008-07-21,0.90,Jul
2008-07-23,0.90,Jul
2008-07-24,0.93,Jul
2008-07-25,0.96,Jul
2008-07-26,1.0,Jul
2008-07-27,0.94,Jul
2008-07-28,0.88,Jul
2008-07-29,0.95,Jul
2008-07-31,0.98,Jul
"""

sio = StringIO(data.strip())
sio.seek(0)
df = pd.read_csv(sio)
df.hvplot.violin(by="month")

image

But if I remove one line from Jul, the line will not show up... I have absolutely no idea why this is happening...

plots = []
lines = data.strip().split("\n")
for i in range(2, len(lines)):
    d = "\n".join([l for idx, l in enumerate(lines) if i != idx])
    sio = StringIO(d)
    sio.seek(0)
    df = pd.read_csv(sio)
    plots.append(df.hvplot.violin(by="month", label=lines[i]))

hv.Layout(plots).cols(2).opts(shared_axes=False)

hoxbro avatar Apr 03 '24 08:04 hoxbro

This is a WebGL problem. Can you try disabling hv.renderer("bokeh").webgl = False.

hoxbro avatar Apr 03 '24 09:04 hoxbro

Thanks, that worked.

yt87 avatar Apr 03 '24 16:04 yt87