extremevalues icon indicating copy to clipboard operation
extremevalues copied to clipboard

Cutoff for the rho

Open pwang16 opened this issue 4 years ago • 4 comments

Hi Mark,

I Iike the "extremevalues" package, it is very cool!

I have several questions:

Because the "limit" on y axis in the output value is from the rho, right? could you let me know the formula of rho to calculate the limit as cutoff to determine the outliers? Could you please give me suggestion how to set the cutoff of the rho?

Thank you and look forward to your reply, Peng

pwang16 avatar Sep 22 '19 23:09 pwang16

Sorry, one additional question: what is rho? It looks I can get more outliers when I increase its value.

Best, Peng

pwang16 avatar Sep 23 '19 01:09 pwang16

Hi Peng, from ?getOutliers:

rho: (Method I) A value y_i is an outlier if it is below (above)
          the limit where less then rho[2] (rho[1]) observations are
          expected. Must be >0.

So rho is a vector with two values that determine sensitivity for outliers on the left (small numbers) and right (large numbers) of the distribution.

getOutliers does the following:

  1. estimate parameters for the univariate data distribution, using only the 'bulk' of the data.
  2. determine, based on the estimated distibution where the limits are where we expect less than rho observations. A lower rho will therefore give less outliers (in general) and less false positives (but possibly more false negatives)

does that help? See also this paper

-Mark

markvanderloo avatar Sep 23 '19 09:09 markvanderloo

regarding your first question: yes, the limits are those computed by the extremevalues and they are ultimately determined by rho and the parameters estimated from your data. -Mark

markvanderloo avatar Sep 23 '19 09:09 markvanderloo

Hi Mark,

Thank you very much for your reply. I have a better understanding now. I still have a question: I can get 5 outliers only when I use the rho=70, in this case do you think rho value is too high? Is there a p value to set a threshold for rho value? For example the rho value corresponding to the p<0.05? I feel that rho value setting arbitrary, how shoud I describe the the cutoff for rho value if I use a one in my paper to convince the reviewers?

Look forward to your reply, Peng

pwang16 avatar Sep 23 '19 23:09 pwang16