plotly.py icon indicating copy to clipboard operation
plotly.py copied to clipboard

Issue with plotly express scatterplots

Open OlovJ opened this issue 2 years ago • 16 comments

I have an issue with larger scatterplots not displaying correctly. I am using jupyter on a Mac M1 and both using it from vs code and from jupyter-lab have the same issue (or I might be doing something wrong)

When displaying 1000 points it works as I expect but as soon as I get over 1000 points the x-values is not correct any more

import pandas as pd
import plotly
import plotly.express as px
import plotly.graph_objects as go
import random
import datetime

df = pd.DataFrame(columns=['metric', 'value', 'time'])
for i in range(1001):
    df = pd.concat([df, pd.DataFrame({'metric': random.choice(['a', 'b', 'c', 'd', 'e']), 
                                      'value': random.random()*20, 
                                      'time': datetime.datetime.now() + datetime.timedelta(seconds=random.randint(1, 300))}, index=[0])],
                                      ignore_index=True)
print(df.info())
print("Plotly version", plotly.__version__)
print("Pandas version", pd.__version__)
fig1 = px.scatter(df.head(1000), x="time", y="value", title="Durations", color='metric')
fig2 = px.scatter(df, x="time", y="value", title="Durations", color='metric')
fig1.show()
fig2.show()

Will have the output:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1001 entries, 0 to 1000
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype         
---  ------  --------------  -----         
 0   metric  1001 non-null   object        
 1   value   1001 non-null   float64       
 2   time    1001 non-null   datetime64[ns]
dtypes: datetime64[ns](1), float64(1), object(1)
memory usage: 23.6+ KB
None
Plotly version 5.14.1
Pandas version 2.0.0

and the plots 1000 points image

and 1001 points image

other information of value is that the hoover data is triggered in the correct place so when i hover empty space i get the correct popup image

if i do it using go figures it works as expected

fig = go.Figure()
for metric, group in df.groupby("metric"):
    fig.add_trace(go.Scatter(x=group['time'], y=group['value'], mode='markers', name=metric))
fig.show()
image

OlovJ avatar Apr 07 '23 07:04 OlovJ

Tried it on a windows computer and it works there same version of plotly and pandas and the same version of python 3.11.3

OlovJ avatar Apr 07 '23 10:04 OlovJ

Also the same plot but with line works perfectly so it only seems to happen with scatters over 1000 points on Arm for me.

OlovJ avatar Apr 07 '23 10:04 OlovJ

I am experiencing the same issue. Works great on my windows laptop but I see similar vertical lines on my Mac. If you hover your mouse around in spaces between the vertical lines, the annotation of the points that were supposed to be there (but are invisible for some reason) pop up.

hammas9 avatar Apr 17 '23 21:04 hammas9

@OlovJ out of curiosity does the same issue occur with pandas==1.5.3?

AaronStiff avatar Apr 18 '23 16:04 AaronStiff

I can confirm the issue with pandas==2.1.1 and plotly==5.17.0

StefanKaiser-TomTom avatar Oct 10 '23 07:10 StefanKaiser-TomTom

I'm hitting the same issue using both versions 5.9.0 and 5.19.0 on MacOS.

j-at-ch avatar Feb 23 '24 16:02 j-at-ch

Just dug into the code and I've found the cause (thanks @OlovJ for investigating the 1000 point threshold - it helped to find the culprit).

The issue here seems to be the renderer - see this line in the source code.

Switching the renderer to "svg" solves this issue for me.

px.scatter(df, x="time", y="value", render_mode="svg")

j-at-ch avatar Feb 23 '24 16:02 j-at-ch

@j-at-ch THank you for investigating this issue. But I'm actually not able to reproduce this error from the initial code posted.

Can you please share the exact code you used to reproduce the error?

Coding-with-Adam avatar Feb 28 '24 14:02 Coding-with-Adam

@Coding-with-Adam I've posted a MWE .ipynb sheet on Google Colab here. Can you confirm that you see the rendering issue there?

The issue seems to be how the system handles render_mode="auto" and so isn't possible to reproduce from the code alone. I think the unexpected behaviour occurs when:

  • using render_mode="webgl" and
  • more than a certain number of points in a scatter plot

II haven't found any documentation of this behaviour - and on MacOS using PyCharm to run jupyter notebooks the default behaviour seems to be "webgl" and so scatters with more than a certain number of points always show this undesirable binning effect.

j-at-ch avatar Mar 04 '24 09:03 j-at-ch

Thanks @j-at-ch . The Google Colab code you shared looks good on my Firefox browser. No issue there either. I'm not sure why I'm not able to reproduce this error.

Coding-with-Adam avatar Mar 08 '24 20:03 Coding-with-Adam

Thanks for the follow-up @Coding-with-Adam!

I've updated a few things in the Google Colab notebook so that there are plots with:

  • an explicit render_mode='auto'
  • an explicit render_mode='webgl'
  • an explicit render_mode='svg'.

Would you be able to check the updated plots are report if there are any differences please?

At least then we'll be able to tell whether the behaviour some of us are observing is due to renderer options (and how our individual systems handle the auto option).

j-at-ch avatar Mar 09 '24 15:03 j-at-ch

hi @j-at-ch Thanks for looking further into this. All figures look exactly the same to me.

image

Coding-with-Adam avatar Mar 13 '24 16:03 Coding-with-Adam

Here's what it looks like to me @Coding-with-Adam:

Figure 1: render_mode='auto'

Figure 2: render_mode='webgl'

Figure 3: render_mode='svg'

Maybe this is a hardware-related issue for 'webgl'? I'm using MacOS with an M1 Pro chip.

j-at-ch avatar Mar 13 '24 17:03 j-at-ch

Thanks for sharing the images @j-at-ch .

@alexcjohnson Any idea what might be causing this bug in webgl? Or, any idea how we can test it to dig deeper?

Coding-with-Adam avatar Mar 14 '24 14:03 Coding-with-Adam

We've seen hardware-dependent precision issues in WebGL a few times... and it makes sense that it would show up mostly on date axes, where the zero is all the way back at 1970 so the difference of a few minutes in recent years is in a fairly deep digit.

There's probably some ugly way around it for now (I seem to recall a command-line switch to use higher precision?) but it'll keep popping up unless we do something deeper like rescaling all the data around a zero that's on or near the actual axis range before we send it to WebGL. That's a pretty big project, but it would ensure WebGL gets the same precision as SVG.

alexcjohnson avatar Mar 19 '24 02:03 alexcjohnson

Thanks for the diagnosis @alexcjohnson - helpful context! Manually setting render_mode='svg' should work for my use-cases for now. Just need to remember to set it!

j-at-ch avatar Mar 20 '24 10:03 j-at-ch