pingouin
pingouin copied to clipboard
Partial Correlation gives unexpected output for toy example
Hi, thanks for this great library! I am getting perfect correlation for the following toy example. I expected ~zero correlation as in the regression approach.
import numpy as np
import pandas as pd
import statsmodels.api as sm
import pingouin as pg
from scipy.stats import pearsonr
n = 10000
y = list(range(1, n+1))
x = y + np.random.normal(size=n)*0.1
z = y
df = pd.DataFrame({'x': x, 'y': y, 'z': z})
print(pg.partial_corr(data=df, x='x', y='y', covar=['z']))
# Regress x on z and u and get residuals
X_with_const = sm.add_constant(np.column_stack([z])) # Add a constant and include both z and u
model_X = sm.OLS(x, X_with_const).fit()
residuals_X = model_X.resid
# Regress y on z and u and get residuals
model_Y = sm.OLS(y, X_with_const).fit()
residuals_Y = model_Y.resid
# Compute correlation of residuals
residual_corr, p = pearsonr(residuals_X, residuals_Y)
print(f'Partial correlation using statsmodels: {residual_corr}, {p}')
Output:
n r CI95% p-val
pearson 10000 1.0 [1.0, 1.0] 0.0
Partial correlation using statsmodels: 0.0012024407422241278, 0.9043016773480718
pingouin.__version__
'0.5.4'