plotly.py icon indicating copy to clipboard operation
plotly.py copied to clipboard

[BUG] px.scatter "ols" not producing linear trend line

Open jconoranderson opened this issue 3 years ago • 9 comments

Describe your context Please provide us your environment, so we can easily reproduce the issue.

  • replace the result of pip list | grep dash below
dash                               2.3.1
dash-auth                          1.4.1
dash-bootstrap-components          1.0.2
dash-core-components               2.0.0
dash-html-components               2.0.0
dash-table                         5.0.0
  • if frontend related, tell us your Browser, Version and OS

    • OS: macOS Version 12.2.1 (21D62)
    • Browser: Chrome
    • Version: 100.0.4896.88 (Official Build) (x86_64)

Describe the bug

"ols" (original least squares) function to add a linear trend line is not producing a regression line. It is instead something closer to polynomial.

The code I'm using to create the graph is as follows:

 elif beh_gph == 'ols':
            dfg[date_frmt] = pd.to_datetime(dfg[date_frmt])
            print(dfg)
            fig = px.scatter(dfg, x=date_frmt, y="Episode_Count", color="Target",
                             labels={"Episode_Count": tally + " per Shift",
                                     "Target": "Target",
                                     "Yr_Mnth": "Date"},
                             trendline="ols", title="Aggregate Behavior Data: " + patient + " - " + today)
            fig.update_xaxes(tickangle=45,)
            fig.update_layout(template='plotly_white', hovermode="x unified")

Instead of a logistic regression line per the example here - https://plotly.com/python/linear-fits/

I'm getting this:

enter image description here

The x and y values are just floating point numbers and date values respectively.

The Plotly version is 5.7.0

Expected behavior

Linear regression line.

jconoranderson avatar Apr 18 '22 18:04 jconoranderson

I updated to the latest dash (2.3.1) and the problem still persists...

jconoranderson avatar Apr 18 '22 18:04 jconoranderson

@nicolaskruchten I can't quite tell what we're falling back on here but I'm guessing this just means ols trendlines don't support dates? Any hunch how hard this would be?

alexcjohnson avatar Apr 19 '22 16:04 alexcjohnson

Possibly, it worked as expected when I ran the same code a Mac. This is currently being run on Windows 10. I can provide my entire codebase if that might help?

jconoranderson avatar Apr 19 '22 17:04 jconoranderson

A full reproducible example would be great, yes. Simplified to the minimal case if you can. My hunch about what's happening here: we're not able to use dates as the x data in the curve fitting algorithm, so it's using row indices as the x data during fitting, but somehow the indices used are out of order on Windows. If that's the case, then even on Mac where the indices are ordered correctly, the fit looks right only because your dates happen to be evenly spaced.

alexcjohnson avatar Apr 19 '22 17:04 alexcjohnson

This is meant to work even on non-evenly-spaced dates: dates are converted to floats and the regression happens there, then the ~X values are converted back into dates~ the original X values are provided to Plotly.js. I'll take a look sometime this week. The relevant code starts here https://github.com/plotly/plotly.py/blob/master/packages/python/plotly/plotly/express/_core.py#L331

nicolaskruchten avatar Apr 19 '22 18:04 nicolaskruchten

Hi @nicolaskruchten... was there any update on this?

jconoranderson avatar May 09 '22 20:05 jconoranderson

No update yet, no. Can you provide a fully runnable example including data please?

The standard test case I use for OLS with dates on the X axis is this px.scatter(px.data.stocks(indexed=True, datetimes=True), trendline="ols") and it looks as expected to me.

nicolaskruchten avatar May 09 '22 21:05 nicolaskruchten

Also can you confirm the version of Plotly you are using? The latest is 5.7.0

nicolaskruchten avatar May 09 '22 21:05 nicolaskruchten

Hello,

I'm not sure if this has been resolved but I am seeing a similar issue when trying to plot data that only has date values on the first of each month (though this is in a Jupyter environment and not Dash so I'm not sure if it is exactly the same case).

I've included the code below. This code does work properly when I run it in Google Colab, and there is a specific difference in the data that I don't understand (more below).

OS: Windows 10 v 20H2 build 19042.1826

  • plotly 5.9.0
  • jupyterlab 3.3.2
import pandas as pd
import plotly.express as px
import datetime


df = pd.DataFrame( {'Date': ['2018-01-01','2018-02-01','2018-03-01','2018-04-01','2018-05-01','2018-06-01','2018-07-01','2018-08-01','2018-09-01','2018-10-01'],
                    'Units' : [36.044379,31.036306,34.354977,33.189577,32.906101,35.679296,48.577445,53.967781,51.684226,32.638374]})


df['Date'] = pd.to_datetime(df['Date'])
df['Date_serial'] = [(d - datetime.datetime(1970,1,1)).days for d in df['Date']]
df['Datevalue'] = df['Date'].values.astype(int)

fig = px.scatter(df, x = 'Date', y = 'Units', trendline = 'ols', trendline_color_override = 'red')

fig2 = px.scatter(df, x='Date_serial', y = 'Units', trendline = 'ols', trendline_color_override = 'red')

fig.show()
fig2.show()

This produces two plots, the first of which uses the Datetime column and has a non-linear trendline. The second plot I converted the dates into a serialized format and the trendline is now linear.

plotly_graph_example_08182022

But as I noted, the plots render as expected when I run them in Google Colab. The major difference between the results I get in my environment and what I get in Google Colab are the values in the DateValue field.

Colab results:

Date | Units | Date_serial | Datevalue
-- | -- | -- | --
2018-01-01 | 36.044379 | 17532 | 1514764800000000000
2018-02-01 | 31.036306 | 17563 | 1517443200000000000
2018-03-01 | 34.354977 | 17591 | 1519862400000000000
2018-04-01 | 33.189577 | 17622 | 1522540800000000000
2018-05-01 | 32.906101 | 17652 | 1525132800000000000


My results:

-- | -- | -- | --
2018-01-01 | 36.044379 | 17532 | 1581514752
2018-02-01 | 31.036306 | 17563 | -153812992
2018-03-01 | 34.354977 | 17591 | -612827136
2018-04-01 | 33.189577 | 17622 | 1946812416
2018-05-01 | 32.906101 | 17652 | 2068578304



I have no idea why the Datevalue numbers are so different, but I imagine the values being out of order is part of (or the entire) issue.

EDIT -- If I convert to int64 instead of int I get the same values as I see in Colab. It looks like this line in the Plotly code linked above converts to int, which I suspect produces the negative values for me:

  if x.dtype.type == np.datetime64:
                        x = x.astype(int) / 10**9  # convert to unix epoch seconds

Colab environment: Python 3.7.13 Plotly 5.5.0 Pandas 1.3.5

My current environment: Python 3.10.0 Plotly 5.9.0 Pandas 1.4.2

I also replicated my error on an older environment: Python 3.8.3 Plotly 5.6.0 Pandas 1.0.5

Let me know if any other details would be helpful.

sdelu avatar Aug 18 '22 20:08 sdelu

I can replicate this with current plotly 5.11.0, pandas 1.4.1, statsmodels 0.13.5 and Python 3.8.15 on Win10 Enterprise 64bit 22H2.

I ran the example code from @sdelu and got the same plots.

Likewise, the example px.scatter(px.data.stocks(indexed=True, datetimes=True), trendline="ols") looks like this:

grafik

m-ad avatar Jan 09 '23 11:01 m-ad

By the way, this issue is not limited to "ols", so maybe the issue can be renamed to something along the lines of "broken trendlines with datetime x-axis". Here is how the the second "lowess" example from the documentation looks on my system:

import plotly.express as px

df = px.data.stocks(datetimes=True)
fig = px.scatter(df, x="date", y="GOOG", trendline="lowess", trendline_options=dict(frac=0.1))
fig.show()

grafik

m-ad avatar Jan 09 '23 12:01 m-ad

Hm... it seems that @sdelu was on the right track. I changed this line from

x = x.astype(int) / 10**9  # convert to unix epoch seconds

to

x = x.astype(np.int64) / 10**9  # convert to unix epoch seconds

Now the examples all work just fine.

m-ad avatar Jan 09 '23 12:01 m-ad

Fixed in 5.12!

nicolaskruchten avatar Jan 24 '23 16:01 nicolaskruchten