dynesty Bounding issues/diagnostics

Bounding issues/diagnostics

Open segasai opened this issue 3 years ago • 15 comments

Hi,

I am looking at the results of the sampling of my complicated problem (with results different from multinest gives me) and I was looking at diagnosing what's going on, so I looked at a few tests that surprised me (below). First my problem: dimensionality 11, 2 periodic parameters, 2500 live points, I run with sampler 'unif' and dlogz_init=0.01

I looked at the number of ellipsoids as a function of progression of the run. And to my surprise. I saw oscillations between 1 ellipsoid and 70 ellipsoids in. See It doesn't mean it's necessarily wrong, but it is certainly worrisome.
I looked at the volume (logvol_tot) as a function of progression of the run. Here you can see again oscillations in volume.
I decided to check for all the points in the run (base_u), compute in how many of different boundaries they are in. I.e. if everything is perfect the id of the point in the run and the number of boundaries they are in should be monotonically related.I.e. later points in the run should be basically in all the boundaries up to a given point. Here you can clearly see the lack of monotonicity. And that even some of the latest live-points are not in previous bounds.
Finally The 2d plot showing point id in the base_run vs boundary id showing fraction of points in the given boundary Here again you see two problems: Many latest points are not in first boundaries. And it looks like only those boundaries consisting of one ellipsoid are good (vertical stripes).

My thoughts on all this:

I think either there is some over-agressive ellipsoidal shredding ( by design) or there is a bug somewhere in dealing with large number of ellipsoids
I find the lack of diagnostics of these issues very concerning. I.e. basically there is no indication of these issues in the output. I think doing the containment checks for all boundaries is certainly doable and maybe a way forward. (I think it still may be better to do that with all the likelihood evaluations rather than live points, but it's better than nothing). Also huge oscillations in number of ellipsoids is certainly something that may need to be flagged/prevented.
It is pretty useless to do add_batch() after a run like this, as it won't fix these issues.
It may be useful to have a way to 'improve boundaries' after the run before doing add_batch (that's connected to the topic raised before)
An alternative viewpoint may be -- the likelihood function is too complex, so tough luck, it'll be hard to sample no matter what.
I obviously cannot 100% exclude some kind of non-deterministic bug in my likelihood function leading to observed behaviour, but I am reasonably certain that's not the case.

Apr 09 '21 11:04 segasai

These are an instructive set of plots and a useful set of tests. Quick comments (in order of points):

The oscillations in the number of ellipsoids and the corresponding log-volume definitely highlight that sometimes the decomposition is failing and defaulting to a very large single ellipsoid. It would be good to track down why that's happening or if it's just a problem with the default options (vol_dec and vol_check) needing adjustment. My guess is the large ellipsoids only are around for an iteration or two before they get redone so they don't affect the overall results, but I'm sure they hammer the sampling efficiency.
The ellipsoidal shredding problem is a perpetual problem, especially early on. I've tried a bunch of ways to reduce the severity of this issue, but it's quite endemic. There might be some ways around it that I missed a few years ago though when I was doing most of the fundamental development. If you have any ideas, I'd be happy to hear them and see what we can do!
Adding in new diagnostics (or at least tools to generate these diagnostics/plots from completed runs) would be extremely helpful. What do you think would be the best place/format for them? (Just having a function to make a similar plot is a great start and could be added to the docs as part of the recommended plots users generate after each run.)
add_batch() isn't meant to fix these issues (it only can improve on the baseline run), so I agree. If I should make this clearer in the documentation somewhere, let me know.
As mentioned in #232, I agree and think there might be some paths forward there.
It's totally possible the likelihood function is actually fine and it's entirely dynesty at fault here for poor heuristics -- I personally have been impressed by other approaches implemented in, e.g., UltraNest, and wouldn't be surprised if something like that turns out to work better for this problem.

Apr 09 '21 15:04 joshspeagle

Regarding how to include those checks. I'd say we need a function belonging to Results or to Sampler. Something like validate() with a few options. Maybe it needs to be even called by default by run_nested() What I did is just like y.contains(x) for x in base_u for y in bounds
Regarding add_batch() the reason I made the comment is that dynamic nested sampling's advantage is presented as the ability to improve the posterior by adding more points to it, but it only really works if the posterior is already well traced essentially.
And regarding the oscillations in number of ellipsoids I am actually more concerned by large number of ellipsoids rather than having one and low efficiency, as it seems that it's when the number of ellipsoids is large the high logl points are missed.
And yes, I am aware of ultranest, It seems interesting, but personally I haven't tested it enough in complex cases.

Apr 09 '21 16:04 segasai

I'd say we need a function belonging to Results or to Sampler. Something like validate() with a few options.

Would you envision something like that being run at various checkpoints during the sampling or just at the end? I'm trying to think of exactly how the results could be communicated to the user in an informative way, especially since the feedback I've gotten is the current set of outputs for the progressbar (and the save quantities in the output results) is already pretty large. If the results could inform the bounding distributions too, that would be a dream end goal.

On a sidenote: there is a bootstrap procedure enabled to try and enlarge the ellipsoids, but it's disabled by default because I found it to be unstable in many cases. I could easily see validate() being used to estimate enlargement factors in a very similar way throughout the course of the run, as an example of how that could interface with improvements to the construction of bounding distributions.

dynamic nested sampling's advantage is presented as the ability to improve the posterior by adding more points to it, but it only really works if the posterior is already well traced essentially

Yes, exactly. That's at least how most standard implementations work (including the one here), anyways. I'm happy to add larger disclaimers throughout the docs to try and make this point more explicit. Would that help?

I am actually more concerned by large number of ellipsoids...it's when the number of ellipsoids is large the high logl points are missed

Ah yes, gotcha. Maybe this just means the defaults should be even more conservative than what they already are to try and further mitigate this problem.

Apr 09 '21 17:04 joshspeagle

Would you envision something like that being run at various checkpoints during the sampling or just at the end? I'm trying to think of exactly how the results could be communicated to the user in an informative way, especially since the feedback I've gotten is the current set of outputs for the progressbar (and the save quantities in the output results) is already pretty large. If the results could inform the bounding distributions too, that would be a dream end goal.

I think I would do it at the end. Sure it's nice to report things during sampling, but I don't think it's necessary. (TBH I don't think I know what every number in dynesty's status bar is ; and I think the only numbers that I check there are time, dlogz and logl ) IMO the goal is to just to warn if is something looks clearly bad.

On a sidenote: there is a bootstrap procedure enabled to try and enlarge the ellipsoids, but it's disabled by default because I found it to be unstable in many cases. I could easily see validate() being used to estimate enlargement factors in a very similar way throughout the course of the run, as an example of how that could interface with improvements to the construction of bounding distributions.

Yes, I saw those parameters, but never experimented with them. TBH I think the right way to deal with that is come up with a few dedicated tests. AFAIU the ellipsoid decomposition is completely decoupled from the rest of the code. So I think testing it on

ball in n-D
highly elongated 'sausage'
curved manifold (i.e. n-D sphere or curved line)
From what I understand the key factors in bounding are, what fraction of the actual distribution is covered (ideally all), what fraction of volume is empty (efficiency) and those numbers as a function of number of points/dimensionality. I think it's not too hard to write those test. And at least there is a set of plots to see if things are better or worse.

Apr 09 '21 19:04 segasai

Okay -- here's the test.

I've sampled the points from 4 different shapes in 3d/10d: ball, pin (long cylinder with 1/0.01 axis ratio), shell with 1% thickness and 1-D torus (S^1) with 1% thickness and checked the missing fraction of the volume by MultiEllipsoid or Ellipsoid as a function of nlive. I.e. I simulate points from the distribution. Use nlive of them for Multiellipsoid/Ellipsoid and see what fraction of the volume is left out. If everything is correct it should be zero or at least go to zero when nlive->inf The reality is pretty grim, see below multi_10d multi_3d ell_10d ell_3d

I include the code as well

import dynesty.bounding as db
import numpy as np
import scipy.special
import matplotlib.pyplot as plt


def genball(npt, ndim):
    # use Barthe2005
    x = np.random.normal(size=(npt, ndim))
    y = np.random.exponential(0.5, size=npt)
    x1 = x / np.sqrt((y + (x**2).sum(axis=1)))[:, None]
    return x1


def genshell(r1, r2, npt, ndim):
    x = np.random.normal(size=(npt, ndim))
    xnorm = x / ((x**2).sum(axis=1)**.5)[:, None]
    # normed vector
    # radii are distributed like R^(ndim-1)
    # cumul (R^ndim-r1^ndim)/(r2^ndim-r1^ndim)=y
    rs = ((r2**ndim - r1**ndim) * np.random.uniform(size=npt) +
          r1**ndim)**(1. / ndim)
    return rs[:, None] * xnorm


def gen_data(npt, typ, ndim):
    mid = .5  # i'm placing in unit cube
    if typ == 'ball':
        r0 = 0.5
        pts = genball(npt, ndim) * r0 + mid
        volume = (np.pi**(ndim / 2) / scipy.special.gamma(ndim / 2 + 1) *
                  r0**ndim)
    elif typ == 'pin':
        w = 0.01
        a = 1
        pts = np.zeros((npt, ndim))
        pts[:, 1:] = genball(npt, ndim - 1) * w + mid
        pts[:, 0] = (np.random.uniform(size=npt) - .5) * a + mid
        volume = (np.pi**((ndim - 1) / 2) /
                  scipy.special.gamma((ndim - 1) / 2 + 1) * w**(ndim - 1) * a)
    elif typ == 'torus':
        w = 0.01
        r0 = 0.45
        pts = np.zeros((npt, ndim))
        pts[:, :2] = genshell(r0 - w / 2, r0 + w / 2, npt, 2) + mid
        pts[:, 2:] = (np.random.uniform(size=(npt, ndim - 2)) * 2 -
                      1) * w / 2 + mid
        volume = w**(ndim - 2) * np.pi * ((r0 + w / 2)**2 - (r0 - w / 2)**2)
    elif typ == 'shell':
        r1 = 0.45
        r2 = 0.46
        pts = genshell(r1, r2, npt, ndim) + mid
        volume = (np.pi**(ndim / 2) / scipy.special.gamma(ndim / 2 + 1) *
                  (r2**ndim - r1**ndim))
    else:
        raise RuntimeError('unknown', typ)
    return pts, volume


def plotter(ndim, bound):
    ids = (10**np.linspace(np.log10(2 * ndim + 1), 3, 40)).astype(int)
    plt.clf()
    objs = ['ball', 'pin', 'shell', 'torus']
    for i in range(4):
        plt.subplot(2, 2, 1 + i)
        curo = objs[i]
        fracs = np.array([1 - doit(_, curo, ndim, bound)[1] for _ in ids])
        plt.semilogx(ids, fracs)
        plt.xlabel('Nlive')
        if i == 0:
            postf = '%d-D bound %s' % (ndim, bound)
        else:
            postf = ''
        plt.title(curo + ' ' + postf)
        plt.ylabel('missing fraction')
        plt.ylim(0, 1)
        plt.xlim(10, 1000)
    plt.tight_layout()


def doit(nlive, typ, ndim, bound='ell'):
    # return volume fraction, point fraction
    # for nlive points with topology typ in ndim dimension
    #
    totpt = 10 * nlive  # simulate more points
    pts, volume = gen_data(totpt, typ, ndim)
    assert ((pts.min() > 0) and (pts.max() < 1))  # inside cube
    fitpts = pts[:nlive]
    testpts = pts[nlive:]
    logvol_ell, fracin = computer(fitpts, testpts)
    return np.exp(logvol_ell) / volume, fracin


def computer(fitpts, testpts, bound='multi'):
    """ Compute logvolume and fraction of points covered
    given actual live points (fitpts) and test points (testpts)"""
    # ndim = fitpts.shape[-1]
    cent = fitpts.mean(axis=0)
    cov = np.cov(fitpts.T)  # ndim)

    curb = db.Ellipsoid(cent, cov)  # pts)
    curb.update(fitpts, mc_integrate=True)
    if bound == 'multi':
        curb = db.MultiEllipsoid([curb])
        curb.update(fitpts, mc_integrate=True)
    if bound not in ['ell', 'multi']:
        raise RuntimeError('unknown bound', bound)
    # if bound == 'multi':
    #    logvol = curb.monte_carlo_logvol()[0]
    # elif bound == 'ell':
    # logvol = curb.logvol
    # fraction of test points in the boundary
    frac = np.array([curb.contains(_) for _ in testpts]).sum() / len(testpts)
    return curb.logvol, frac

My understanding (unless I have a bug in my code) is that there are massive issues with bounds. Specifically the bound will miss large fraction in low dimensions (because less of the volume is next to edges). And the torus case is extremely problematic (this is mimicking a curved 1d degeneracy). I think that means that the volumes need to be inflated.

Apr 09 '21 23:04 segasai

And after enabling bootstrap=10 option

multi_10d_boot multi_3d_boot

The problems go away. My takeaway is that the default value of bootstrap must not be set to zero.

Apr 09 '21 23:04 segasai

Trying bootstrapping with all the changes from my tynifix branch, I see the following problems.

With bootstrap factors ranging 1.3-1.8 in 11 dimensions the main problem becomes the propose_point function, because after the expansion, many ellipsoid volumes start to have log(V)=3 or even 5. Which means that substantial volume there is beyond the unit cube. So a significant time (on a single thread), is spent repeated resampling and seeing that the point outside the cube.
Another factor which I think can lead to overshredding of ellipsoids, is that multiple ellipsoids have larger bootstrap factor than a single one, that's why I think ellipsoidal splitting is less of a good idea comparing to a naive calculation
Despite slowdowns with bootstrap>0, it's clear to me that running bootstrap=0 for anything other than gaussian posterior will produce incorrect results, as the volumes shrink way too fast.

Apr 11 '21 13:04 segasai

The results from 1 and 2 here are why they ended up being disabled by default, since they require a huge number of live points to avoid becoming overly enlarged in higher-dimensions. As for 3, that's true in the sense that you're guaranteed to miss parameter space but not necessarily true in terms of parameter/evidence estimation. As I outline in the Appendix of the dynesty paper, it turns out that "prior bounds must encompass the entire distribution" doesn't quite need to hold true for evidence estimation (or even posterior marginals) to be ~correctly estimated; the "actual" condition is that the local sampling is able to follow the correct likelihood shrinkage pattern of the global distribution. This turns out to be easier to match than you'd think in many applications, and is why correlated proposals/shredding can often give reasonable results even though they might violate this condition pretty severely at times. Of course, this only holds true until it doesn't hold true, which always comes back to bite you at the worst possible time...

Anyways, what I ultimately mean to say is I think these tests are fantastic and really highlight some of the problems with this particular proposal strategy (with uniform sampling), but that it's not guaranteed to be a total loss. It's also one of the main struggles I had when setting the defaults for most users. Do you have additional recommendations for changes outside of the new set of tests you've submitted and possible utility functions/plots to add in?

Apr 15 '21 04:04 joshspeagle

On the first I don't think I quite agree. I agree that the shrinking will be correct if your logl>X shape is constant as a function of logl. I.e Gaussian/eggshell, But if the topology/shape changes as logl changes that's where the problem will arise I suspect (don't have an easy example to show, but is clearly true for my science problem I was solving)

IMO I'd rather have defaults that are protective, I.e. they are maybe an overkill for simple problems, but are the right thing for difficult ones. Especially since as opposed to say standard MCMC approaches there is not a set of standard convergence tests for nested sampling (a-la Gelman-Rubin/Geweke etc tests) that can give you some confidence.

IMO I think bootstrap/sampling can be improved. For example, Currently the scaling of the region is done in all directions, but maybe we can get scaling that is different along different directions to avoid scaling beyond the cube. I also noticed that the sampling within the ellipsoid can be probably parallized as well, which would avoid it being a bottleneck in the case of parallel running. Alternatively I was thinking if the ellipse is that much larger than the unit-cube, maybe we can just sample in a bounding box of the ellipse that fits inside unit cube ? I haven't tested whether that can be benefitial. Anyway I think having a test problem where bootstrap is somewhat stuck is good IMO, to try to improve.

In terms of immediate recommendations, for sure validating function evaluations/boundaries after the run is important and alarming about big discrepancies. In my opinion, having the boundary algorithm return the correct boundary with the default settings is important as well (i.e. the tests that I give above). I'd put those in.

Apr 15 '21 22:04 segasai

Hi @segasai, would it be possible for you to include the code you used to generate all these diagnostic plots? It might be useful for myself and other users with similar issues in the meantime. Thanks!

May 26 '21 08:05 a-lhn

Hi @a-lhn the volume check code is given above. If you want to reproduce the plots like shown above you can use the results of the sampler:

sampler.run_nested(dlogz=0.01)
mask=np.array([[bb.contains(_) for _ in sampler.saved_run.D['u']] for bb in sampler.bound])

that will give you the boolean mask for all boundaries and all the points in the run

May 26 '21 12:05 segasai

Just a note to myself that it might be cool to try and add in some of these utilities as part of the code for users to check as part of the new release (see #254).

Sep 15 '21 09:09 joshspeagle

i think a problem that I currently see is that the bound set is not currently properly tracked, i.e. we don't track which logl level each bound correspond to. I.e. the code just does self.bound.append(bound) But ideally one would want some self-contained object with (logl, bound) pairs. That would somewhat help #232 as well.

But otherwise we already now can reproduce the plot from the top by this

res=sampler.results
plt.clf();plt.imshow(np.array([[_.contains(__) for __ in res['samples_u']] for _ in res['bound']]),aspect='auto')

Part of the reason why I'm interested in quantifying boundaries is that I'm interested in approaches where the first sampling can dictate some kind of transform of data that makes the sampling much easier. I.e. use first dynesty run as exploration, and then sample with much higher sampling efficiency. (I know some developers of zeus-mcmc were thinking/working on such approaches using normalizing flows).

Sep 15 '21 19:09 segasai

I think explicitly tracking the (logl -> bounding) object (separate from the actual samples) would be also potentially useful, if one wants to sample the modes separately or deciding on merging of independent runs. I.e. if their Boundary(logl) don't overlap enough they can't be merged.

Sep 15 '21 19:09 segasai

I will link the issue #232 here and close it to avoid having two similar issues.

Sep 08 '22 16:09 segasai

dynesty dynesty copied to clipboard

Bounding issues/diagnostics

dynesty
dynesty copied to clipboard