sbi icon indicating copy to clipboard operation
sbi copied to clipboard

NPE sampling gets stuck without warning

Open michaeldeistler opened this issue 8 months ago • 1 comments

Discussed in https://github.com/sbi-dev/sbi/discussions/1566

Originally posted by ali-akhavan89 April 16, 2025 Hi all,

If the simulator generates NaN/Inf (I'm sure about NaN so far) values, I think the sampling gets stuck without any warning or errors being raised. I've seen these situations before, but I've never been able to figure out what's happening until very recently. Please see the codes attached and my comments before generating theta_true and x_o. As I've described, you can change the 2nd dimension of theta_true from 5.7019e-03 to -5.7019e-03 to force the simulator to generate NaN values. The posterior sampling gets stuck. Note that the true parameter would be outside of the prior boundaries, but I highly doubt that all posterior samples are being rejected because I don't even get the "slow sampling warning." I've attached the simulator alone so that you can run the model to see the outputs with the problematic true parameter values (by the way, this is a simplified SEIRb model, and you can see the full paper here).

Does my explanation make sense? If so, wouldn't it better to have a mechanism in SBI to raise a warning or error in these situations?

I also have another question that might be related to this. I've been testing FMPE, and I see a similar situation where the sampling seems to get stuck even when I'm using a true theta that is "okay" and doesn't cause any error in the simulation. But it only happens if I use FMPE combined with flowmatching_nn. FMPE with maf goes through without any issues. I've added comments in the code before the inference object so that you can test the settings yourself easily. But, I'm not entirely sure what is the root cause of this issue. So, I would appreciate any thoughts or feedback on this.

Thank you! Ali

Archive.zip

michaeldeistler avatar Apr 17 '25 12:04 michaeldeistler

I have also observed the "sampling gets stuck" without any warning when running this tutorial. For me, it felt completely stochastic when it does or does not happen. I will convert this to an issue, but I unfortunately do not know the root cause either.

michaeldeistler avatar Apr 17 '25 12:04 michaeldeistler

I was not able to reproduce this error for NPE either in the example raise by @ali-akhavan89 or in the tutorial mentioned by @michaeldeistler. However, for FMPE, the reason the low acceptance rate warning never gets raised is that we never produce any samples, because odeint inside ZukoODE never runs.

We currently don't explicitly check for nans or infs when setting an x_o, i.e. in the user input checks. I think the easiest solution would be to explicitly check for nans or infs in the condition here. If we really want to support this (e.g. if nans or infs are handled in the embedding net), then we could evaluate the forward pass of the vector field with the given conditions before trying to sample, and checking if this returns a nan. I would be in favour of the first option. What do you think @michaeldeistler?

gmoss13 avatar Nov 11 '25 13:11 gmoss13

I am also in favour of the first option.

janfb avatar Nov 11 '25 19:11 janfb