cmdstan icon indicating copy to clipboard operation
cmdstan copied to clipboard

Enable STAN_THREADS by default

Open WardBrian opened this issue 2 years ago • 9 comments

Summary:

Following #1176 and with the addition of Pathfinder, a lot of the primary usages of cmdstan models are able to use threading, but only if STAN_THREADS is True.

I believe this has some downsides for single threaded performance (though I have never seen hard numbers), but it may be worth discussing moving this from an opt-in to an opt-out

Current Version:

v2.32.2

WardBrian avatar Aug 10 '23 15:08 WardBrian

Last time we checked Linux and macOS did have a negligible slowdown ~2% while windows was at ~10%. This was the case for certain models which have to dereference a lot the thread local pointer to the ad tape. This is a few years back and newer compilers may have improved things. From my view we should have gone to always threading long ago as the advantages outweigh the downsides … from my view (Linux & macOS user)…

wds15 avatar Aug 10 '23 16:08 wds15

I would hope the Windows slowdown is much less nowadays, given that we no longer support Rtools 3.5 and the 4.9.3 compiler it used.

rok-cesnovar avatar Aug 10 '23 18:08 rok-cesnovar

What would be the least intrusive way to enable this in a way that still lets users disable it? Most of our makefile logic just checks if STAN_THREADS is defined, not that it is set to a "truthy" value

WardBrian avatar Aug 18 '23 13:08 WardBrian

One possible way I thought of is giving the opt-out a new name, and then we can do something like

ifndef STAN_SINGLETHREADED
STAN_THREADS = 1
endif

near the top of the CmdStan makefile

WardBrian avatar Aug 18 '23 13:08 WardBrian

If my memory serves me right, then ode models using the rk45 integrator are performance sensitive to this threading thing on windows (because the ad tape is used a lot in relative terms to what else is going on).

An opt out option is a good start….but all in threading would make things simpler going forward.

wds15 avatar Aug 18 '23 16:08 wds15

@wds15 do you know where I can find the original testing that showed the 30% hit? Hopefully it wouldn't need too much updating to check against the current version

andrjohns avatar Aug 30 '23 14:08 andrjohns

Sorry…don’t know that any longer. I would use from the performance test repo the ode example. That uses the rk45 integrator which is sensitive to this as I recall.

wds15 avatar Aug 30 '23 21:08 wds15

I noticed RStan appears to unconditionally enable threading on Windows: https://github.com/stan-dev/rstan/blob/develop/StanHeaders/src/Makevars.win#L3

WardBrian avatar Apr 10 '24 14:04 WardBrian

Here is a summary of what I know on Windows:

Naively enabling STAN_THREADS while building with mingw's GCC leads to about a 2.5x slowdown on the schools-4 model (which has been known for quite some time) even when running on a single thread. There are known issues with the winpthread library used my mingw: https://github.com/msys2/MINGW-packages/discussions/13259, which we suspect are the cause

Alternatives:

  • There is no slowdown observed when using WSL
  • There is little (~10%) to no slowdown observed when using clang-cl (supporting this requires significant work on our build system)
  • I'm having some difficulty using a GCC built to use mcfgthread instead of winpthread, but it does appear that this has a lesser, but still pretty significant, slowdown (the model gets ~50% slower, not 100%+)

WardBrian avatar Apr 10 '24 16:04 WardBrian