dirichletprocess
dirichletprocess copied to clipboard
prior choice for beta2
Hi,
I am trying to apply to real data the function dirichletprocess beta but I got an extrange result. My goal is to cluster a one dimensional vector of percentages therefore bounded to [0,1]. I use the dirichlet process beta but I got this strange spyke near the boundary of 0. Would you have any advice on how to fit it in another way: Currently using dpobj = DirichletProcessBeta(y, maxY = 1, g0Priors = c(2,150), mhStep = c(0.25, 0.25), hyperPriorParameters = c(1, 1/150))
dpFit = Fit(dpObj = dpobj, 2000, updatePrior = TRUE) plot(dpFit)
This happens because the prior choice allows for parameter values that give infinity at zero. I've added a new mixture model, with a different prior that makes sure that the value at the boundaries is always zero.
Install the dev version from Github and create a dp object of the new mixture using the code below.
devtools::install_github("dm13450/dirichletprocess")
dp <- DirichletProcessBeta2(testData, 1)
dp <- Fit(dp, 1000)
plot(dp)
Hi thanks a lot for soon reply and solution to this issue. DirichletProcessBeta2 looks like it works as wanted! Reading the code that in the new version I see that you are using a pareto distribution for the prior is that correct? I was just wondering if it would be possible to update the prior also for dpbeta2? I have had a look at how to go with this and try to do something like the following but I'm not sure could you double check if that is correct for the parameterisation of the pareto?
PriorParametersUpdate.beta2 <- function(mdObj, clusterParameters, n = 1) {
hyperPriorParameters <- mdObj$hyperPriorParameters
priorParameters <- mdObj$priorParameters
numClusters <- dim(clusterParameters[[1]])[3]
posteriorXm <- hyperPriorParameters[1] + priorParameters[1] * numClusters
posteriorAlpha <- hyperPriorParameters[2] + priorParameters[2] * numClusters
newNu <- rpareto(n, posteriorXm, posteriorAlpha)
newPriorParameters <- matrix(c(priorParameters[1], newNu), ncol = 2)
mdObj$priorParameters <- newPriorParameters
return(mdObj)
}
That's not correct.
We are able to perform a hyperparamter update in the original Beta because of conjugacy between the prior and hyper prior.
By using the pareto distribution we lose the conjuacy and so updating the prior parameter (gamma) of the pareto distribution is more complicated.
In my opinion, you don't need to worry about updating the prior parameter. Your data is between 0 and 1, so I think it should find the clusters sufficiently fast.
Instead, check the sensitivity to the gamma parameter by fitting different dp's with different gamma values.
dp1 <- DirichletProcessBeta2(testData, 1, goPriors =2)
dp2 <- DirichletProcessBeta2(testData, 1, goPriors =4)
If the results don't change with different values of the prior parameter then you don't need to worry about updating the value.
For completeness, I will add in a PriorParametersUpdate.beta2. But for now you can perform your analysis without it.
Hi that is great,
Thanks a lot again. I see that other relevant functionality works so thanks!