bambi icon indicating copy to clipboard operation
bambi copied to clipboard

Feature Request: Nested priors for common terms

Open kpalin opened this issue 2 years ago • 7 comments

Hi,

I'm trying to play around with sparse models and would like to place NormalMixture prior with Dirichlet weights as the prior for the common terms. The code would be like

model = bmb.Model("response~1+d", data)
w_prior = bmb.Prior("Dirichlet", a=[1.0, 1.0])
subject_prior = bmb.Prior("NormalMixture", w=w_prior, mu=[0, 0], sigma=[0.1, 4.5])
p = {"d": subject_prior}
model.set_priors(p)
model.build()

It crashes with aesara saying NotImplementedError: Cannot convert Dirichlet(a: [1.0 1.0]) to a tensor variable. I think it should be fairly simple fix to build the priors recursively also for CommonTerm:s like already done for GroupSpecificTerm:s. (Apologies for this not being a pull request.)

kpalin avatar Nov 04 '22 07:11 kpalin

@kpalin could you provide the data you're using so I can reproduce the error? I would like to know what are response and d.

tomicapretto avatar Nov 04 '22 11:11 tomicapretto

Do you want to reproduce the following PyMC model?

with pm.Model():
    w_prior = pm.Dirichlet("w", a=[1, 1])
    y = pm.NormalMixture("y", w=w_prior, mu=[0, 0], sigma=[0.1, 4.5], observed=response)

or do you also want to include a predictor? What would be the predictor?

tomicapretto avatar Nov 04 '22 11:11 tomicapretto

Yes I'd like to have a predictor, e.g.

with pm.Model():
    w_prior = pm.Dirichlet("w", a=[1, 1])
    beta = pm.NormalMixture("beta", w=w_prior, mu=[0, 0], sigma=[0.1, 4.5], shape=D)

    sigma = pm.HalfStudentT("sigma", nu=4, sigma=2.0322)
    pm.Normal("y", mu=pm.math.dot(X, beta), sigma=sigma, observed=y)
    tr = pm.sample()

The full code is in https://gist.github.com/kpalin/5069b6de8eb0f9fb7efb2d5861ade53c and the output from a run (in vscode interactive python) is in random_effect_bambi.pdf

kpalin avatar Nov 04 '22 15:11 kpalin

Kimmo, this is some cool stuff. Thanks for sharing.

On Fri, Nov 4, 2022 at 11:16 AM Kimmo Palin @.***> wrote:

Yes I'd like to have a predictor, e.g.

with pm.Model(): w_prior = pm.Dirichlet("w", a=[1, 1]) beta = pm.NormalMixture("beta", w=w_prior, mu=[0, 0], sigma=[0.1, 4.5], shape=D)

sigma = pm.HalfStudentT("sigma", nu=4, sigma=2.0322)
pm.Normal("y", mu=pm.math.dot(X, beta), sigma=sigma, observed=y)
tr = pm.sample()

The full code is in https://gist.github.com/kpalin/5069b6de8eb0f9fb7efb2d5861ade53c and the output from a run (in vscode interactive python) is in random_effect_bambi.pdf https://github.com/bambinos/bambi/files/9939632/random_effect_bambi.pdf

— Reply to this email directly, view it on GitHub https://github.com/bambinos/bambi/issues/584#issuecomment-1303739565, or unsubscribe https://github.com/notifications/unsubscribe-auth/AH3QQV6D5JJ6QWLDS6CJF7LWGUSF5ANCNFSM6AAAAAARW4LCRU . You are receiving this because you are subscribed to this thread.Message ID: @.***>

zwelitunyiswa avatar Nov 04 '22 15:11 zwelitunyiswa

@kpalin have a look at this branch in my fork https://github.com/tomicapretto/bambi/tree/experiment_prior_common

See the notebook play.ipynb.

It seems to work. But I can't promise this will enter Bambi (at least in this way) since I'm not very sure what's going on... To me, if the weights are shared among all the priors for all "d0", ... "d9" , it makes sense to see them as hierarchical. But I may be confused here.

Anyway, have a look at that and let me know if that works for you. Also, if you have references about this, I would love to have a read.

tomicapretto avatar Nov 04 '22 20:11 tomicapretto

That seems to work, also with formula like response ~ 0+d0+d1+d2+d3+d4+d5+d6+d7+d8+d9 (I've never before seen the c(d0,d1...) syntax.)

I don't have proper references for this apart from the blog post https://betanalpha.github.io/assets/case_studies/modeling_sparsity.html I already linked. I guess sparsity is not a thing with Bayes, when you can just sum over probability zero events.

kpalin avatar Nov 07 '22 10:11 kpalin

I'm leaving it open because I don't have time now to finish off this PR (and understand how it works too!). But it is definitely very interesting.

Edit

For the record. c(d0, d1, ..., d9) is not the same than d0 + d1 + ... + d9. The first creates a single model term which is ten-dimensional, and the second creates ten model terms where each one is one-dimensional.

tomicapretto avatar Nov 07 '22 12:11 tomicapretto