ydata-profiling
ydata-profiling copied to clipboard
Wrong stationarity alert in time series
Current Behaviour
I made a report of a time series and then used the following code: description = profile.get_description() for col in df: var1 = description.variables.get(col) stat = var1.get('stationary') p = var1.get('addfuller') display("Column: " + col + " ; Stationary: " + str(stat) +" ; P: " + str(p))
I analysed a data set with some columns and get the following result: Column: Column 1 ; Stationary: False ; P: 8.367848162951153e-15 Column: Column 2 ; Stationary: False ; P: 1.0170622187220445e-11 Column: Column 3 ; Stationary: False ; P: 2.555609761088582e-05 Column: Column 4 ; Stationary: False ; P: 7.172269761903138e-08 Column: Column 5 ; Stationary: False ; P: 9.321131415426812e-18 Column: Column 6 ; Stationary: False ; P: 9.027089348108759e-15 Column: Column 7 ; Stationary: False ; P: 0.02133819126759494 Column: Column 8 ; Stationary: False ; P: 4.406120572138344e-12 Column: Column 9 ; Stationary: False ; P: 0.0028888647417244155 Column: Column 10 ; Stationary: False ; P: 0.00044090523969600784 Column: Column 11 ; Stationary: False ; P: 0.00286260675205775 Column: Column 12 ; Stationary: False ; P: 0.0001708455587419074 Column: Column 13 ; Stationary: False ; P: 9.472249697294651e-30 Column: Column 14 ; Stationary: False ; P: 2.526552913384979e-12 Column: Column 15 ; Stationary: False ; P: 0.000455609981090904 Column: Column 16 ; Stationary: False ; P: 0.0004254554235795494 Column: Column 17 ; Stationary: None ; P: None Column: Column 18 ; Stationary: None ; P: None Column: Column 19 ; Stationary: False ; P: 1.2239118466383953e-16 Column: Column 20 ; Stationary: True ; P: 9.06748511005521e-29 Column: Column 21 ; Stationary: True ; P: 0.005396832629069178 Column: Column 22 ; Stationary: True ; P: 1.850847639853015e-11
Expected Behaviour
I would expect, that it marks every column with a p-value of < 0.05 as "stationary".
Data Description
I used a private dataset
Code that reproduces the bug
description = profile.get_description()
for col in df:
var1 = description.variables.get(col)
stat = var1.get('stationary')
p = var1.get('addfuller')
display("Column: " + col + " ; Stationary: " + str(stat) +" ; P: " + str(p))
pandas-profiling version
v4.9.0
Dependencies
Package Version
0 ydata_profiling v4.9.0
1 pandas 2.1.4
2 numpy 1.26.4
3 matplotlib 3.7.1
4 statsmodels 0.14.2
5 Python 3.10.12
6 OS Linux 6.1.85+
OS
Linux 6.1.85+
Checklist
- [X] There is not yet another bug report for this issue in the issue tracker
- [X] The problem is reproducible from this bug report. This guide can help to craft a minimal bug report.
- [X] The issue has not been resolved by the entries listed under Common Issues.
Here some data in the attachment if needed ADF_Test.csv
#Load dataset to dataframe import pandas as pd filename = 'ADF_Test.csv' data = pd.read_csv(filename,sep=',',decimal='.', parse_dates=["time"], index_col="time") display(data)
import ydata_profiling
Create a profile report
profile = ydata_profiling.ProfileReport(data, title="ADF Test", explorative=True, tsmode=True) profile.to_notebook_iframe() profile.to_file("ADF_Test.html")
description = profile.get_description()
for col in data: var1 = description.variables.get(col) stat = var1.get('stationary') p = var1.get('addfuller') display("Column: " + col + " ; Stationary: " + str(stat) +" ; P: " + str(p))
And then I get: Column: Col1 ; Stationary: False ; P: 8.367848162944989e-15
Hi @Blackandwhite23!
I looked at the file: describe_timeseries_pandas.py
function pandas_describe_timeseries_1d, which returns the stationary and p_value. The function has a check for seasonal (if it is, return False(row 214)):
stats["stationary"] = is_stationary and not stats["seasonal"]
My knowledge of statistics is modest. From everything I've seen and read, I know that remove the trend. Here it is written that stationary ones do not have a trend and seasonality. And there is also a discussion here.