rEDM icon indicating copy to clipboard operation
rEDM copied to clipboard

How to deal with outliers in time series?

Open wj431364 opened this issue 1 year ago • 1 comments

Hi, i am using the very nice pkg rEDM (pyEDM) in my project. However, i find that function CCM is very sensitive to the outliers in time series, which is mainly from the pearson correlation used in the function.

In a extreme case, the causality result will drop from 0.7 to 0.1 by only adding a single data point. This is somehow counter-intuitive to the definition of causality. I want to know if there exists any method i can deal with those outliers.

Many thanks!

This is an example to reproduce the issue (sorry i am using python)

import numpy as np
import pandas as pd
from pyEDM import *
x = np.random.random(200)
y = np.sin(x) + np.random.random(200) * 0.5
x[-1] = 0
data = pd.DataFrame()
data['0'] = 0
data['x1'] = x
data['x2'] = y

E = max(findE(data['x1'].values,data['x2'].values))

CCM(
    dataFrame=data,
    columns='x1',
    target='x2',
    libSizes='50 160 20',
    sample=100,
    E = 3,
    showPlot=True
)


x[-1] = 15
data = pd.DataFrame()
data['0'] = 0
data['x1'] = x
data['x2'] = y

E = max(findE(data['x1'].values,data['x2'].values))

CCM(
    dataFrame=data,
    columns='x1',
    target='x2',
    libSizes='50 160 20',
    sample=100,
    E = 3,
    showPlot=True
)

results are image image

wj431364 avatar Jul 20 '23 06:07 wj431364

I recommend you post this over in @SugiharaLab/rEDM as that is the active version of the software package.

ha0ye avatar Jul 20 '23 13:07 ha0ye