stan icon indicating copy to clipboard operation
stan copied to clipboard

vb crash on Windows

Open bgoodri opened this issue 9 years ago • 11 comments

I can get a segfault on Windows from cmdstan (or rstan) with the following model / data / syntax

 ./weibull.exe variational data file=weibull.data.R random seed=1913258051

weibull.txt weibull.data.txt

This seems to not be reproducible on Linux / Mac, although it is difficult to get meanfield to converge.

bgoodri avatar Jan 22 '16 19:01 bgoodri

@akucukelbir or @dustinvtran Do you have access to a Windows machine to reproduce this? Change weibull.txt to weibull.stan and weibull.data.txt to weibull.data.R .

bgoodri avatar Jan 22 '16 19:01 bgoodri

Unfortunately no. :( I can see if it's possible to get it to converge, and maybe that would indirectly point us to where the problem is.

dustinvtran avatar Jan 22 '16 19:01 dustinvtran

negative on my end as well. no more windows in my life.

On Fri, Jan 22, 2016 at 2:57 PM, Dustin Tran [email protected] wrote:

Unfortunately no. :( I can see if it's possible to get it to converge, and maybe that would indirectly point us to where the problem is.

— Reply to this email directly or view it on GitHub https://github.com/stan-dev/stan/issues/1758#issuecomment-174028066.

akucukelbir avatar Jan 22 '16 20:01 akucukelbir

The backtrace is related to the streaming

#2  0x00514a70 in std::ostreambuf_iterator<char, std::char_traits<char> > std::num_put<char, std::ostreambuf_iterator<char, std::char_traits<char> > >::_M_insert_float<double>(std::ostreambuf_iterator<char, std::char_traits<char> >, std::ios_base&, char, char, double) const ()

bgoodri avatar Jan 22 '16 21:01 bgoodri

that, sadly, says nothing to me. :(

On Fri, Jan 22, 2016 at 4:28 PM, bgoodri [email protected] wrote:

The backtrace is related to the streaming

#2 0x00514a70 in std::ostreambuf_iterator<char, std::char_traits > std::num_put<char, std::ostreambuf_iterator<char, std::char_traits > >::_M_insert_float(std::ostreambuf_iterator<char, std::char_traits >, std::ios_base&, char, char, double) const ()

— Reply to this email directly or view it on GitHub https://github.com/stan-dev/stan/issues/1758#issuecomment-174054154.

akucukelbir avatar Jan 22 '16 22:01 akucukelbir

Sadly, that's all it says. But I think it means that it tried to print some segfault inducing output to the screen rather than the variational approximation process terminated unexpectedly. Possibly a string was too long or had an illegal character in it.

On Fri, Jan 22, 2016 at 5:01 PM, Alp Kucukelbir [email protected] wrote:

that, sadly, says nothing to me. :(

On Fri, Jan 22, 2016 at 4:28 PM, bgoodri [email protected] wrote:

The backtrace is related to the streaming

#2 0x00514a70 in std::ostreambuf_iterator<char, std::char_traits > std::num_put<char, std::ostreambuf_iterator<char, std::char_traits > ::_M_insert_float(std::ostreambuf_iterator<char, std::char_traits >, std::ios_base&, char, char, double) const ()

— Reply to this email directly or view it on GitHub https://github.com/stan-dev/stan/issues/1758#issuecomment-174054154.

— Reply to this email directly or view it on GitHub https://github.com/stan-dev/stan/issues/1758#issuecomment-174066770.

bgoodri avatar Jan 22 '16 22:01 bgoodri

Not a C++ whisperer yet? You just need to be motivated to parse everything out and type what you find into Google.

This:

#2 0x00514a70 in std::ostreambuf_iterator<char, std::char_traits > std::num_put<char, std::ostreambuf_iterator<char, std::char_traits > >::_M_insert_float(std::ostreambuf_iterator<char, std::char_traits >, std::ios_base&, char, char, double) const ()

is a function signature, which is revealed by reorganizing it a bit (it helps to drop all the "std::", while you're at it):

RETURN TYPE:

ostreambuf_iterator<char, char_traits >

FUNCTION:

num_put<char, ostreambuf_iterator<char, char_traits > >::_M_insert_float

ARGUMENT TYPES:

(ostreambuf_iterator<char, char_traits >, ios_base&, char, char, double)

CONST DECLARATION

const ()

So we know to look up "std::num_put" (I couldn't live without cplusplus.com):

http://www.cplusplus.com/reference/locale/num_put/

So it's crashing at some point where it's trying to insert a double into an output stream. And it looks like its inheriting from a locale somewhere, which is something that could easily vary across platforms.

The memory location's not too helpful (to me, at least). I usually just try to bisect the code manually using print statements at this point, but you need to be able to recreate the error for that. You could also review all the I/O if you have any other hint as to when the error occurs.

We have a Windows box in the office for just this kind of spelunking.

  • Bob

On Jan 22, 2016, at 5:01 PM, Alp Kucukelbir [email protected] wrote:

that, sadly, says nothing to me. :(

On Fri, Jan 22, 2016 at 4:28 PM, bgoodri [email protected] wrote:

The backtrace is related to the streaming

#2 0x00514a70 in std::ostreambuf_iterator<char, std::char_traits > std::num_put<char, std::ostreambuf_iterator<char, std::char_traits > >::_M_insert_float(std::ostreambuf_iterator<char, std::char_traits >, std::ios_base&, char, char, double) const ()

— Reply to this email directly or view it on GitHub https://github.com/stan-dev/stan/issues/1758#issuecomment-174054154.

— Reply to this email directly or view it on GitHub.

bob-carpenter avatar Jan 23 '16 04:01 bob-carpenter

What Bob said. I'm continuing to guess that it tried to stream a double that was too big. If that is correct, could we use scientific notation or something for the delta_ELBO_mean? It already looks weird when it uses different numbers of digits to the left of the decimal point.

bgoodri avatar Jan 23 '16 05:01 bgoodri

i'm happy to look into using scientific notation for delta_ELBO_mean. i don't know how to test for this though... do either of you know how to test for these sorts of windows-only bugs?

akucukelbir avatar Feb 02 '16 12:02 akucukelbir

We do have a Windows machine. You might be able to write a unit test that just streams a million random characters in the same way that ADVI does.

On Tue, Feb 2, 2016 at 7:47 AM, Alp Kucukelbir [email protected] wrote:

i'm happy to look into using scientific notation for delta_ELBO_mean. i don't know how to test for this though... do either of you know how to test for these sorts of windows-only bugs?

— Reply to this email directly or view it on GitHub https://github.com/stan-dev/stan/issues/1758#issuecomment-178556064.

bgoodri avatar Feb 02 '16 13:02 bgoodri

@dustinvtran, @akucukelbir. Bump.

syclik avatar May 27 '16 17:05 syclik