pingouin icon indicating copy to clipboard operation
pingouin copied to clipboard

pg.qqplot() out of bounds

Open FlorinAndrei opened this issue 2 years ago • 3 comments

Data set:

HealthExam.csv

Code:

import numpy as np
import pandas as pd
import statsmodels.api as sm
import pingouin as pg
import seaborn as sns
import matplotlib.pyplot as plt

sns.set_theme()

he = pd.read_csv('HealthExam.csv')
fs = he['Cholesterol'].loc[he['Sex'] == "F"]

sm.qqplot(fs, line='s');

sm-qq

pg.qqplot(fs);

pg-qq

The output from Statsmodels QQ matches what I get from R. The output from PG extends too far to the right.

FlorinAndrei avatar Jun 21 '22 01:06 FlorinAndrei

Pingouin forces the x-axis and y-axis to have the same units and the same limits. In your example, because there is an outlier point (y=3.5, x=2), it forces the x-axis to extend up to >3.5. Furthermore, Pingouin adds the diagonal line for easy comparison against a normal distribution

https://github.com/raphaelvallat/pingouin/blob/b1c334d93f8f7f8b13c39199c08a7d8b619afd95/pingouin/plotting.py#L381-L387

raphaelvallat avatar Jun 21 '22 03:06 raphaelvallat

I see your point, but after seeing the second image everyone goes "you need to trim the right-hand edge".

I think it would be fine if the diagonal didn't actually go corner-to-corner. It's enough if it's just the y=x line, wherever that may fall on the image.

FlorinAndrei avatar Jun 21 '22 17:06 FlorinAndrei

I see your point, but after seeing the second image everyone goes "you need to trim the right-hand edge".

@FlorinAndrei I think you could just "trim the right-hand edge" yourself after making the plot with pingouin.

ax = pg.qqplot(fs)
ax.set_xbound(upper=2.5)

If you are concerned with the units being rescaled on the x-axis after this, you could resize your figure to accommodate that, with something like fig.set_figwidth (after retrieving the figure with fig = plt.gcf()). Alternatively, you could mess with setting the aspect before you trim.

remrama avatar Jun 21 '22 18:06 remrama