pingouin
pingouin copied to clipboard
pg.qqplot() out of bounds
Data set:
Code:
import numpy as np
import pandas as pd
import statsmodels.api as sm
import pingouin as pg
import seaborn as sns
import matplotlib.pyplot as plt
sns.set_theme()
he = pd.read_csv('HealthExam.csv')
fs = he['Cholesterol'].loc[he['Sex'] == "F"]
sm.qqplot(fs, line='s');
pg.qqplot(fs);
The output from Statsmodels QQ matches what I get from R. The output from PG extends too far to the right.
Pingouin forces the x-axis and y-axis to have the same units and the same limits. In your example, because there is an outlier point (y=3.5, x=2), it forces the x-axis to extend up to >3.5. Furthermore, Pingouin adds the diagonal line for easy comparison against a normal distribution
https://github.com/raphaelvallat/pingouin/blob/b1c334d93f8f7f8b13c39199c08a7d8b619afd95/pingouin/plotting.py#L381-L387
I see your point, but after seeing the second image everyone goes "you need to trim the right-hand edge".
I think it would be fine if the diagonal didn't actually go corner-to-corner. It's enough if it's just the y=x line, wherever that may fall on the image.
I see your point, but after seeing the second image everyone goes "you need to trim the right-hand edge".
@FlorinAndrei I think you could just "trim the right-hand edge" yourself after making the plot with pingouin.
ax = pg.qqplot(fs)
ax.set_xbound(upper=2.5)
If you are concerned with the units being rescaled on the x-axis after this, you could resize your figure to accommodate that, with something like fig.set_figwidth
(after retrieving the figure with fig = plt.gcf()
). Alternatively, you could mess with setting the aspect before you trim.