Turing.jl Numerical error messages: real problem or just terminal noise?

Numerical error messages: real problem or just terminal noise?

Open fredcallaway opened this issue 3 years ago • 6 comments

I'm running a very simple model inferring a Gamma distribution over response times in a memory task.

When I run HMC (or NUTS) I get hundreds, maybe thousands of Warning: The current proposal will be rejected due to numerical error(s). messages. As a new user, coming from pymc, this was quite disconcerting. Is something terribly wrong with my model?

But upon looking further, it seems that these are showing up all over the place, even in one of the official tutorials (without any explanation). So this makes me think that some amount of numerical error is really not something to worry about. But then, maybe we shouldn't be filling up the user's scrollback history with these messages? For example, maybe a running tally should be kept and reported once along with the other diagnostics? Or maybe this should be set a @debug level message?

May 20 '21 01:05 fredcallaway

The messages are caused by AdvancedHMC which unfortunately does not respect the verbosity levels of Turing. The errors can be real and indicate a problem with the model and/or the sampler settings but they can be ignored in the initial phase when the step size is tuned - depending on the model and the initialization, it can happen that a too large step size results in a non-finite gradient of the log density.

Related: https://github.com/TuringLang/Turing.jl/issues/1398, https://github.com/TuringLang/AdvancedHMC.jl/issues/217, https://github.com/TuringLang/Turing.jl/issues/1493

May 20 '21 01:05 devmotion

Thanks! It sounds like the AdvancedHMC devs don't want to fix this, so maybe the solution is (as @xukai92 suggested in TuringLang/AdvancedHMC.jl#217) to wrap the logging messages in Turing? Ideally Turing would do some diagnostics to figure out if the errors are safe to ignore (I have no idea they're happening in the initial tuning phase or not) and print a warning message if they aren't.

If one of the first thing a user encounters when using a library is a huge wall of cryptic warnings, they are less likely to continue using that library. So there is a very real cost here, beyond just the inconvenience of having to wrap your sample calls in a Logging.with_logger call.

May 20 '21 02:05 fredcallaway

If one of the first thing a user encounters when using a library is a huge wall of cryptic warnings, they are less likely to continue using that library.

In defense of Turing and AdvancedHMC, in many cases these warnings do not show up and they are not always noise but can help to identify problems in your model, initialization, or sampling hyperparameters. So I'm not sure if the correct approach is to remove these warnings completely by default, I am worried that it might be annoying for users as well if they do not realize that there are problems or why there are problems. Instead maybe it would be helpful to

explain more clearly in the message what these warnings indicate, what causes them, when they can be ignored, and how to disable them (maybe the longer explanation only shown once, similar to deprecation warnings)
by default ignore the warnings when the step size is tuned

I assume it would be useful to address both points in AdvancedHMC since I imagine this could be helpful also when one uses AdvancedHMC directly, without Turing. What do you think, @xukai92?

May 20 '21 07:05 devmotion

explain more clearly in the message what these warnings indicate, what causes them, when they can be ignored, and how to disable them (maybe the longer explanation only shown once, similar to deprecation warnings)

I'm worried that explaining it in the warning thoroughly can be hard. How about we add a page in our doc to explain this and link to that from the warning message. We can give more detailed guides via this way.

by default ignore the warnings when the step size is tuned

That's a good idea.

May 23 '21 23:05 xukai92

Silencing warnings by default during during tuning is a good idea. The running tally and summary message is also a good idea; R prints out There were 50 or more warnings (use warnings() to see the first 50) in similar situations.

In practice, I totally ignore these warnings when there are just a few of them, but if there's a cascading wall of text while sampling, I take it to mean there's something wrong with my model. Maybe capture them during sampling, then on completion print out a message like

Warning: 666 proposed samples were rejected due to numerical errors (80% of total). 
Too many numerical errors may reflect problems in the model and/or sampler (increasing the number of tuning samples
or reducing step size may help).  Type xxxxx for details, or yyyyy to disable future warnings.

May 28 '21 18:05 ElOceanografo

Warning: 666 proposed samples were rejected due to numerical errors (80% of total). 
Too many numerical errors may reflect problems in the model and/or sampler (increasing the number of tuning samples
or reducing step size may help).  Type xxxxx for details, or yyyyy to disable future warnings.

I think this is a great message. I think it would also be great to break down the errors by whether they occurred in tuning or sampling.

Jul 17 '22 16:07 ParadaCarleton

Duplicate of https://github.com/TuringLang/Turing.jl/issues/1891

Nov 13 '22 12:11 yebai

Turing.jl Turing.jl copied to clipboard

Numerical error messages: real problem or just terminal noise?

Turing.jl
Turing.jl copied to clipboard