elfi icon indicating copy to clipboard operation
elfi copied to clipboard

Posterior methodologies with Random Forests

Open fradav opened this issue 6 years ago • 2 comments

Summary:

Currently testing a python module wrapping https://github.com/diyabc/abcranger : posterior methodologies (model choice and parameter estimation) with Random Forests on reference table. (See the references)

Description:

I would like to know the best way to integrate the posterior methodologies into the elfi pipeline. It seems any inference method in elfi should have an "iterate" method with every new sample, but both methodologies haven't got any (they need the whole reference table at once)

See the demos at : https://github.com/diyabc/abcranger/blob/master/testpy/Model%20Choice%20Demo.ipynb and https://github.com/diyabc/abcranger/blob/master/testpy/Parameter%20Estimation%20Demo.ipynb

Note that the basic rejection sampler is more than enough with those methodologies (and the threshold parameter almost doesn't matter).

Regards,

fradav avatar Jan 23 '20 09:01 fradav

Hi! Could you clarify a bit what you mean by posterior methodologies and integration of them into ELFI pipeline? e.g. would you like to implement RF-ABC within ELFI? In this case iterate could still used when producing table in batches.

Note that If you don't care about the threshold for rejABC and only would to generate a reference table from the ELFI-model, you can also set quantile = 1.0 in sample-method.

hpesonen avatar Jan 24 '20 09:01 hpesonen

Hi,

I'm working with J-M. Marin, and posterior RF methodologies like model choice and parameter estimation work directly on ABC reference tables, as stated in :

  • ABC model choice (Pudlo et al. 2015)
  • ABC Bayesian parameter inference (Raynal et al. 2018)

By integration in ELFI, I originally mean to implement a new inference method like documented there.

I am not sure about batch processing. RF-ABC prediction performance degrades a lot if you take only small subset of the data. I don't know either how to "accumulate" posterior results from successive batches any other than retraining a forest on all past batches, which of course defeats the batch's purpose. I think this is a perhaps use case for "mondrian" forests; not classical Breiman's rf like the ones we use, but mondrian forests (Lakshminarayanan, Roy, and Teh 2014) are a totally different beast, and there is a lot caveats to them vs Breiman's (sensibility to noise is one of them). Anyway, this is an interesting track for future work.

Threshold doesn't matter "much" with RF-ABC, but it doesn't mean we shouldn't have one, so I think quantile = 1.0 isn't recommended either (I'll double-check this with JM Marin).

References

Pudlo, Pierre, Jean-Michel Marin, Arnaud Estoup, Jean-Marie Cornuet, Mathieu Gautier, and Christian P Robert. 2015. “Reliable Abc Model Choice via Random Forests.” Bioinformatics 32 (6): 859–66.

Raynal, Louis, Jean-Michel Marin, Pierre Pudlo, Mathieu Ribatet, Christian P Robert, and Arnaud Estoup. 2018. “ABC random forests for Bayesian parameter inference.” Bioinformatics 35 (10): 1720–8. https://doi.org/10.1093/bioinformatics/bty867.

Lakshminarayanan, Balaji, Daniel M Roy, and Yee Whye Teh. 2014. “Mondrian Forests: Efficient Online Random Forests.” In Advances in Neural Information Processing Systems, 3140–8.

fradav avatar Jan 24 '20 13:01 fradav