Can't compute of a marginal of a BayesDistribution->it takes ages!
The following script generates a segmentation fault with OT 1.17 and takes more than 96s for two values of the PDF with OT 1.19:
from openturns import *
beta = 2.3
# Loi de Sigma
loi_sigma = Normal(0.5, 0.01)
# Loi de Mv
loi_Mv = TruncatedDistribution(Exponential(beta), 7.5, TruncatedDistribution.UPPER)
# Loi de Mest|(Mv, Sigma)
# ici, la valeur des paramètres n'a pas d'importance
mu = 0.0
ecartType = 1.0
loi_Mest_givenMv_Sigma = TruncatedNormal(mu, ecartType, 0, 50)
# loi (Mv, Sigma) independantes
loi_Mv_Sigma = ComposedDistribution([loi_Mv, loi_sigma])
# fonction lien pour les param de loi_Mest_givenMv_Sigma: R^2 --> R^4
link_func = SymbolicFunction(['mu', 'ecarttype'], ['mu', 'ecarttype', '0', '50'])
# loi (Mest, Mv, Sigma) et (Mest, Sigma)
loi_Mest_Mv_Sigma = BayesDistribution(loi_Mest_givenMv_Sigma, loi_Mv_Sigma, link_func)
loi_Mest_Sigma = loi_Mest_Mv_Sigma.getMarginal([0,2])
loi_Mest_Sigma.computePDF([3.7, 0.7])
loi_Mest_Sigma.computePDF([4.0, 0.5])
I cannot reproduce. Which version is this ? from pip/conda ? os ?
I use OT1.17 (for penstock) installed with pip. It works with OT1.19 (I just installed it in a virtualenv then used pip to install openturns) but it is extremely slow. I am using Linux (kernel 5.15.43) with python 3.8.12. Unfortunately penstock produces wrong results with the latest OT so I have to switch between the two environments depending on the part of the study... This issue is more a request for an enhancement but I can't see how to specify it
it should probably be updated/fixed in penstock, you should probably contact @adumasphi you may want to fiddle with ConditionalDistribution resource map keys to speed up the pdf evaluation
The actual problems are, after a quick inspection:
- The BayesDistribution creates a copy of the conditioned distribution for each call of computePDF()
- The MarginalDistribution uses the default implementation of the computePDF() method, based on finite differences of the CDF, which is in turn computed using a d-dimensional integration where d is the dimension of the distribution. It is reasonable in the general case, where the CDF is the mandatory method to implement, but for some specific cases it is much more efficient to integrate only wrt the marginalized indices, leading to a single intgration in dimension p<d
This issue is linked to #2524 (ordering of the marginals).