qqnorm not showing the highest datapoint & qqplot not showing 2nd-to-highest datapoint
Version information
julia version 1.6.2
StatsPlots v0.14.26
Plots v1.19.3
Hey everyone. I was messing around with qqnorm and noticed that it was neglecting to plot the highest point in the dataset I sent it. The dataset I'm using is located here.
-
I loaded in the data.
data = CSV.File(filePath) |> DataFrame -
Filtered the data into two datasets by gender.
HHSgirls = subset(data, :Gender => ByRow(==("Female")), skipmissing=true)HHSboys = subset(data, :Gender => ByRow(==("Male")), skipmissing=true) -
Ensured that no values from the Reaction_time column were missing values.
HHSgirlsClean = dropmissing(HHSgirls, :Reaction_time);HHSboysClean = dropmissing(HHSboys, :Reaction_time); -
Then I put the data into qqnorm.
qqnorm(HHSgirlsClean[!,:Reaction_time], yaxis="Female Reaction Time", qqline = :fit)I get the following plot.
The issue is that the plot is neglecting the highest value of the column I put into qqnorm(). (Which value is 46.)
maximum(HHSgirlsClean[!,:Reaction_time])
46.0
If I extend the y-axis (and x-axis to be safe) limits to include where the point should be, the point is still missing.
qqnorm(HHSgirlsClean[!,:Reaction_time], yaxis="Female Reaction Time", ylims=(-5,50), xlims=(0,15), qqline = :fit)
Plot from the code directly above.

Same thing happens with the boy dataset.
qqnorm(HHSboysClean[!,:Reaction_time], label="Male Reaction Time",qqline = :fit)

maximum(HHSboysClean[!,:Reaction_time])
1000.0
Graph with the extended axes.

Just a note about the above. Forget the titles on the graphs. I forgot to eliminate them.
However, interesting enough, if I try the same process with the qqplot function, the 2nd to highest point is neglected in the plot.
sort(HHSgirlsClean[!,:Reaction_time])
Output in Julia REPL
0.0489 0.139 0.142 0.148 0.23 0.25 0.261 0.27 ⋮ 3.0 3.0 4.2 5.0 7.129 10.0 30.0 46.0
Just to clarify, 30 is the 2nd to highest point.
Setting up qqplot function with a normal distribution.
normDist = rand(Normal(), 100)
Plotting
qqplot(normDist, HHSgirlsClean[!,:Reaction_time], qqline = :fit)
Result

Note that 30 is not included in the graph.
Same with the boy dataset...
sort(HHSboysClean[!,:Reaction_time])
Output in Julia REPL
0.0417 0.06 0.084 0.1 0.1999 0.202 0.212 0.223 ⋮ 1.2 3.0 5.0 6.0 6.0 6.7 404.0 1000.0
To clarify, 404 is the 2nd to highest point.
Plotting
qqplot(normDist, HHSboysClean[!,:Reaction_time], qqline = :fit)
Result

Notice that point 404 is missing from the graph.
I'm a bit worried that I may be doing something wrong. So, please let me know if this is a user-error on my end. Also, I am using Visual Studio Code. Not sure if that would cause issues.
Thanks for the issue. A minimal working example that shows this behavior is qqnorm([1,2,3]):

However, note that this example is equivalent to calling
using Distributions
qqpair = Distributions.qqbuild(Normal(), [1,2,3])
plot(qqpair)
Note that qqpair just wraps the x and y values of the desired points, which we plot. So I would advise opening an issue on Distributions.jl.
This is now solved with Distributions v0.25.89. Not sure if it is worth to increase the lower bound.