dirichletprocess icon indicating copy to clipboard operation
dirichletprocess copied to clipboard

prior choice for beta2

Open csetraynor opened this issue 6 years ago • 4 comments

Hi,

I am trying to apply to real data the function dirichletprocess beta but I got an extrange result. My goal is to cluster a one dimensional vector of percentages therefore bounded to [0,1]. I use the dirichlet process beta but I got this strange spyke near the boundary of 0. Would you have any advice on how to fit it in another way: Currently using dpobj = DirichletProcessBeta(y, maxY = 1, g0Priors = c(2,150), mhStep = c(0.25, 0.25), hyperPriorParameters = c(1, 1/150))

dpFit = Fit(dpObj = dpobj, 2000, updatePrior = TRUE) plot(dpFit)

image

csetraynor avatar Feb 11 '19 14:02 csetraynor

This happens because the prior choice allows for parameter values that give infinity at zero. I've added a new mixture model, with a different prior that makes sure that the value at the boundaries is always zero.

Install the dev version from Github and create a dp object of the new mixture using the code below.

devtools::install_github("dm13450/dirichletprocess")  
dp <- DirichletProcessBeta2(testData, 1)  
dp <- Fit(dp, 1000)  
plot(dp)

dm13450 avatar Feb 12 '19 10:02 dm13450

Hi thanks a lot for soon reply and solution to this issue. DirichletProcessBeta2 looks like it works as wanted! Reading the code that in the new version I see that you are using a pareto distribution for the prior is that correct? I was just wondering if it would be possible to update the prior also for dpbeta2? I have had a look at how to go with this and try to do something like the following but I'm not sure could you double check if that is correct for the parameterisation of the pareto?


PriorParametersUpdate.beta2 <- function(mdObj, clusterParameters, n = 1) {

  hyperPriorParameters <- mdObj$hyperPriorParameters
  priorParameters <- mdObj$priorParameters

  numClusters <- dim(clusterParameters[[1]])[3]

  posteriorXm <- hyperPriorParameters[1] + priorParameters[1] * numClusters
  posteriorAlpha <- hyperPriorParameters[2] + priorParameters[2] * numClusters
  newNu <- rpareto(n, posteriorXm, posteriorAlpha)

  newPriorParameters <- matrix(c(priorParameters[1], newNu), ncol = 2)
  mdObj$priorParameters <- newPriorParameters

  return(mdObj)
} 

csetraynor avatar Feb 12 '19 11:02 csetraynor

That's not correct.

We are able to perform a hyperparamter update in the original Beta because of conjugacy between the prior and hyper prior.

By using the pareto distribution we lose the conjuacy and so updating the prior parameter (gamma) of the pareto distribution is more complicated.

In my opinion, you don't need to worry about updating the prior parameter. Your data is between 0 and 1, so I think it should find the clusters sufficiently fast.

Instead, check the sensitivity to the gamma parameter by fitting different dp's with different gamma values.

dp1 <- DirichletProcessBeta2(testData, 1, goPriors =2)  
dp2 <- DirichletProcessBeta2(testData, 1, goPriors =4)  

If the results don't change with different values of the prior parameter then you don't need to worry about updating the value.

For completeness, I will add in a PriorParametersUpdate.beta2. But for now you can perform your analysis without it.

dm13450 avatar Feb 12 '19 12:02 dm13450

Hi that is great,

Thanks a lot again. I see that other relevant functionality works so thanks!

csetraynor avatar Feb 12 '19 15:02 csetraynor