extremevalues
extremevalues copied to clipboard
Cutoff for the rho
Hi Mark,
I Iike the "extremevalues" package, it is very cool!
I have several questions:
Because the "limit" on y axis in the output value is from the rho, right? could you let me know the formula of rho to calculate the limit as cutoff to determine the outliers? Could you please give me suggestion how to set the cutoff of the rho?
Thank you and look forward to your reply, Peng
Sorry, one additional question: what is rho? It looks I can get more outliers when I increase its value.
Best, Peng
Hi Peng, from ?getOutliers
:
rho: (Method I) A value y_i is an outlier if it is below (above)
the limit where less then rho[2] (rho[1]) observations are
expected. Must be >0.
So rho
is a vector with two values that determine sensitivity for outliers on the left (small numbers)
and right (large numbers) of the distribution.
getOutliers
does the following:
- estimate parameters for the univariate data distribution, using only the 'bulk' of the data.
- determine, based on the estimated distibution where the limits are where we expect less than
rho
observations. A lowerrho
will therefore give less outliers (in general) and less false positives (but possibly more false negatives)
does that help? See also this paper
-Mark
regarding your first question: yes, the limits are those computed by the extremevalues
and they are ultimately determined by rho
and the parameters estimated from your data.
-Mark
Hi Mark,
Thank you very much for your reply. I have a better understanding now. I still have a question: I can get 5 outliers only when I use the rho=70, in this case do you think rho value is too high? Is there a p value to set a threshold for rho value? For example the rho value corresponding to the p<0.05? I feel that rho value setting arbitrary, how shoud I describe the the cutoff for rho value if I use a one in my paper to convince the reviewers?
Look forward to your reply, Peng