nestle icon indicating copy to clipboard operation
nestle copied to clipboard

Crash in 9 dimensional likelihood

Open wkerzendorf opened this issue 7 years ago • 6 comments

@kbarbary I hope this crash report is somewhat helpful - after a number of iterations for this particular likelihood it always crashes. It is unclear to me why - do you have any pointers?

res_full = nestle.sample(stel_model.logprob, stel_model.transform_priors,
                         9, method='multi', npoints=100,
                         verbose=True, callback=nestle.print_progress, dlogz=1.)

it=  3981 logz=-11465.9874566656779450

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-23-26aa44a81021> in <module>()
      1 res_full = nestle.sample(stel_model.logprob, stel_model.transform_priors,
      2                          9, method='multi', npoints=100,
----> 3                          verbose=True, callback=nestle.print_progress, dlogz=1.)

/lustre/home/wkerzend/miniconda3/envs/starkit/lib/python2.7/site-packages/nestle.pyc in sample(loglikelihood, prior_transform, ndim, npoints, method, update_interval, npdim, maxiter, maxcall, dlogz, decline_factor, rstate, callback, queue_size, pool, **options)
   1019         # Update the sampler based on the current active points.
   1020         if since_update >= update_interval:
-> 1021             sampler.update(pointvol)
   1022             since_update = 0
   1023 

/lustre/home/wkerzend/miniconda3/envs/starkit/lib/python2.7/site-packages/nestle.pyc in update(self, pointvol)
    733     def update(self, pointvol):
    734         self.empty_queue()
--> 735         self.ells = bounding_ellipsoids(self.points, pointvol=pointvol)
    736         for ell in self.ells:
    737             ell.scale_to_vol(ell.vol * self.enlarge)

/lustre/home/wkerzend/miniconda3/envs/starkit/lib/python2.7/site-packages/nestle.pyc in bounding_ellipsoids(x, pointvol)
    509     ell = bounding_ellipsoid(x, pointvol=pointvol, minvol=True)
    510 
--> 511     return _bounding_ellipsoids(x, ell, pointvol=pointvol)
    512 
    513 

/lustre/home/wkerzend/miniconda3/envs/starkit/lib/python2.7/site-packages/nestle.pyc in _bounding_ellipsoids(x, ell, pointvol)
    450     # centroid = (2, ndim) ; label = (npoints,)
    451     # [Each entry in `label` is 0 or 1, corresponding to cluster number]
--> 452     centroid, label = kmeans2(x, k=start_ctrs, iter=10, minit='matrix')
    453 
    454     # Get points in each cluster.

/lustre/home/wkerzend/miniconda3/envs/starkit/lib/python2.7/site-packages/scipy/cluster/vq.pyc in kmeans2(data, k, iter, thresh, minit, missing, check_finite)
    638     for i in xrange(iter):
    639         # Compute the nearest neighbor for each obs using the current code book
--> 640         label = vq(data, code_book)[0]
    641         # Update the code book by computing centroids
    642         new_code_book, has_members = _vq.update_cluster_means(data, label, nc)

/lustre/home/wkerzend/miniconda3/envs/starkit/lib/python2.7/site-packages/scipy/cluster/vq.pyc in vq(obs, code_book, check_finite)
    202     """
    203     obs = _asarray_validated(obs, check_finite=check_finite)
--> 204     code_book = _asarray_validated(code_book, check_finite=check_finite)
    205     ct = np.common_type(obs, code_book)
    206 

/lustre/home/wkerzend/miniconda3/envs/starkit/lib/python2.7/site-packages/scipy/_lib/_util.pyc in _asarray_validated(a, check_finite, sparse_ok, objects_ok, mask_ok, as_inexact)
    236             raise ValueError('masked arrays are not supported')
    237     toarray = np.asarray_chkfinite if check_finite else np.asarray
--> 238     a = toarray(a)
    239     if not objects_ok:
    240         if a.dtype is np.dtype('O'):

/lustre/home/wkerzend/miniconda3/envs/starkit/lib/python2.7/site-packages/numpy/lib/function_base.pyc in asarray_chkfinite(a, dtype, order)
   1213     if a.dtype.char in typecodes['AllFloat'] and not np.isfinite(a).all():
   1214         raise ValueError(
-> 1215             "array must not contain infs or NaNs")
   1216     return a
   1217 

ValueError: array must not contain infs or NaNs```

Happy to help fix. 

wkerzendorf avatar Feb 19 '18 18:02 wkerzendorf

Hi! Check this quastion on SO. This occurs in scipy.cluster.kmeans2. Thus it is not nestle-specific

ipashchenko avatar Feb 20 '18 08:02 ipashchenko

Sounds like nestle is choosing points in the prior volume (unit hypercube) that end up being NaN or Inf. This may be because your transform_priors function is returning NaN or Inf.

kbarbary avatar Feb 20 '18 16:02 kbarbary

Actually, it shouldn't have anything to do with the prior transform. Nestle just chooses these points from within the bounding ellipsoids. Let me think about how best to debug this.

kbarbary avatar Feb 20 '18 16:02 kbarbary

My suspicion is that the bounding ellipsoids nestle is coming up with are ill-formed in some way such that selecting a point from within them results in NaN or inf. It would be good to check exactly when this happens and look at the ellipsoids at this point.

The callback function passed to nestle get a dictionary that contains a reference to the active points and the sampler itself. For example, if you pass this function as the callback, you will print the active points, and the bounding ellipsoids on each iteration:

def mycallback(info):
    nestle.print_progress(info)
    print(info['active_u'])
    print(info['sampler'].ells)

I would create a callback function like this that checks if any values in active_u are NaN or inf. If so, inspect the ellipsoids.

If it takes a long time to run and you want interactive access to the ellipsoids and points in order to inspect them, you could do something where, upon encountering a inf or NaN, save the points and ellipsoids to a global variable, raise an exception (in order to stop nestle.sample), and wrap the nestle.sample call in a try... except block.

kbarbary avatar Feb 21 '18 16:02 kbarbary

@kbarbary it went away again. I will test it next time I encounter it. I think for this it would be great to catch nans or infs and then stop and give back the samples but print out that it exited in a bad way. What do you think?

wkerzendorf avatar Feb 21 '18 17:02 wkerzendorf

@wkerzendorf You might want to start setting a fixed random state via the rstate keyword, so that you can reproduce it when you encounter it.

I think for this it would be great to catch nans or infs and then stop and give back the samples but print out that it exited in a bad way. What do you think?

Without knowing a whole lot more, I consider this a bug that should just be fixed. If there turns out to be some good reason for NaNs or infs, maybe exiting gracefully makes sense. Or one could replace the offending point with another draw and continue on. But it depends on what the root cause is.

kbarbary avatar Feb 22 '18 04:02 kbarbary