SciMLBenchmarks.jl icon indicating copy to clipboard operation
SciMLBenchmarks.jl copied to clipboard

update parameter_estimation

Open ChrisRackauckas opened this issue 2 years ago • 21 comments

ChrisRackauckas avatar Dec 11 '21 17:12 ChrisRackauckas

There are some precompilation issues, looking into it.

Vaibhavdixit02 avatar Dec 12 '21 07:12 Vaibhavdixit02

I need some help in reviewing https://github.com/Vaibhavdixit02/stan-install-buildkite-plugin, the goal is to install cmdstan with buildkite, we were successful in doing it with GitHub actions in DiffEqBayes. @ChrisRackauckas are you familiar with this?

Vaibhavdixit02 avatar Dec 13 '21 16:12 Vaibhavdixit02

I am not. @staticfloat ?

ChrisRackauckas avatar Dec 13 '21 17:12 ChrisRackauckas

@goedman is there a reason CmdStan isn't using BinaryBuilder? It seems weird for it to not be serving binaries in the normal way.

ChrisRackauckas avatar Dec 14 '21 12:12 ChrisRackauckas

Hi Chris (@ChrisRackauckas),

There have been a few attempts over the years but none successful.

Two of the more recent threads are Stan forum and Julia forum.

The Conda route (Stan forum) works and I think particularly on Windows that is a great option.

Tamas conclusion (Julia forum) is also very significant: A cmdstan binary + tools artifact is really, really big.

With Stan's cmdstan binary now using C++ level multithreading makes comparisons even more tricky.

I was not aware of this PR thread. I'll take a look what the objectives are.

goedman avatar Dec 14 '21 15:12 goedman

https://github.com/JuliaPackaging/Yggdrasil/issues/1023 is what we are really looking for, but seems to have failed in https://github.com/JuliaPackaging/Yggdrasil/pull/2830. Maybe @wimmerer has a solution.

ChrisRackauckas avatar Dec 14 '21 15:12 ChrisRackauckas

I mentioned somewhere on Ygg but building Stan was not feasible at that time, upstream just didn't plan to support cross compilation (I don't recall the exact issues now though). I know there were some plans to build certain difficult binaries on the target platform at some point? That would enable us to provide Stan I think.

On the other hand someone more knowledgeable may be able to force the issue and build Stan, but it's not simple.

rayegun avatar Dec 14 '21 16:12 rayegun

That buildkite plugin is very simple; it literally just downloads and unpacks a tarball. No need for this to be a buildkite plugin; just insert those lines as part of the build command. Or even better, make that .tar.gz an artifact and download it as part of your package.

staticfloat avatar Dec 14 '21 16:12 staticfloat

We should do "if the test being run is the one from the ParameterEstimation folder, then..." since the binary download isn't trivial time-wise.

ChrisRackauckas avatar Dec 14 '21 16:12 ChrisRackauckas

We should do "if the test being run is the one from the ParameterEstimation folder, then..." since the binary download isn't trivial time-wise.

That was my motivation to make it a plugin, to keep the pipeline.yml a bit less messy. I guess that doesn't really matter though

Vaibhavdixit02 avatar Dec 14 '21 16:12 Vaibhavdixit02

You could also make it a lazy artifact so it's only downloaded upon first use.

staticfloat avatar Dec 14 '21 16:12 staticfloat

Yeah if anyone knows how to do that then that sounds like the perfect solution.

ChrisRackauckas avatar Dec 14 '21 18:12 ChrisRackauckas

First, add the following to your Artifacts.toml:

[[cmdstan]]
arch = "x86_64"
git-tree-sha1 = "d62fd175524efc55e3258b7fbd290c640bd52c89"
libc = "glibc"
os = "linux"
lazy = true

    [[cmdstan.download]]
    sha256 = "d944d23ac7ed5ebf924d859f3a1f3052891161e2c70313503f21257ea13f0b0c"
    url = "https://github.com/stan-dev/cmdstan/releases/download/v2.26.1/cmdstan-2.26.1.tar.gz"

Next, wherever you want to compute the path to cmdstan, instead use artifact"cmdstan". You'll need to using LazyArtifacts first, though.

That being said, I tried using it a bit, and it looks to me like the tarball contains lots of stuff we don't need (e.g. it contains linux, mac and windows builds) and the files within it are not set to executable (looks like you need to chmod +x the binaries after extracting) so it would be nice if someone were to rebundle what we need from that upstream tarball, and make OS-specific tarballs. If you need help doing that, ping me or Mosé on Yggdrasil, and we can help you make a redistributing recipe for these static builds.

staticfloat avatar Dec 14 '21 18:12 staticfloat

Assume that I am already very lost on how to do what you're describing. @giordano

ChrisRackauckas avatar Dec 14 '21 18:12 ChrisRackauckas

Clearly I don't understand what stanc is. I thought that the only thing you needed from that tarball was the linux-stanc binary, but it looks like you're running a build as well? What does the make build step do for you? What binary/libaries do you actually need from cmdstan?

staticfloat avatar Dec 14 '21 18:12 staticfloat

Vaibhav's buildkite solution works fine (just tried it to make sure on my MacOS Monterey M1 machine) and is very similar to what is being used in the Github CI workflows (and before that on Travis for testing).

I would stay away in trying to redo the make process. It might be doable but who will maintain that with quarterly updates?

Some details:

stanc is an Occam program that translates a Stan Language Program into a C++ source program which subsequently is compiled to a binary specific to the Stan model.

The cmdstan build step creates the stanc program and also processes parts of other libraries (e.g. boost) to tailor it to what is needed in Stan (e.g. mapping of variables to proper support ranges) and produces C++ header files (to cut down the C++ compilation time, it's optional). The build step also builds the stansummary binary (we could drop that, I use that quite a bit though).

Since cmdstan-2-28.1 you have options to optimize:

# Enable threading
#STAN_THREADS=true

# Enable the MPI backend (requires also setting (replace gcc with clang on Mac)
# STAN_MPI=true
# CXX=mpicxx
# TBB_CXX_TYPE=gcc

# Enable the OpenCL backend
# STAN_OPENCL=true

# Add flags that are forwarded to the Stan-to-C++ compiler (stanc3).
# This example enables pedantic mode
# STANCFLAGS+= --warn-pedantic

# Enable C++ compiler and linker optimization recommended by Stan developers.
# Can significantly slow down compilation.
#STAN_CPP_OPTIMS=true

# Remove range checks from the model for faster runtime. Use this flag with caution
# and only once the indexing has been validated. In case of any unexpected behavior
# remove the flag for easier debugging.
#STAN_NO_RANGE_CHECKS=true

On my machine it takes less than a few minutes to install cmdstan and I do it maybe once a quarter unless the header files get outdated (e.g. an Apple tool update). It allows me to tailor above flags to my liking in make/local based on my specific machine and how I run Julia w.r.t. threads.

Until the conda cmdstan install route became an option, Windows was a more difficult platform. The conda install route also works on Unix and MacOS, but for above reasons I prefer running my own install.

On Travis and Github we've been building cmdstan for at least 8 years (I probably almost daily) with few difficulties.

Not sure if this helps, but maybe it does.

goedman avatar Dec 14 '21 19:12 goedman

So okay, I don't know how to do this but the constraints are really, there's 3 tests out of about 100 that will need Stan, so I hope we don't have all benchmarks downloading a big binary to support these few tests. But if we have to, 🤷

ChrisRackauckas avatar Dec 15 '21 21:12 ChrisRackauckas

Looking into this a bit more, I see that even if we pre-build stan, the way you use stan it still needs a compiler and linker and stuff, because it's going to take your stan code and turn it into an executable by transpiling through C? So there's not actually much point in precompiling stan because it needs a compiler to run user-provided stan code anyway. Unless those user programs are static, and we can precompile those completely, of course.

staticfloat avatar Dec 16 '21 01:12 staticfloat

Yes you still need a compiler. That's part of what made me give up on providing it in Yggdrasil, you needed a whole toolchain anyway.

rayegun avatar Dec 16 '21 03:12 rayegun

I would rephrase Elliot's first sentence maybe as: "if we pre-build stanc and stuff we still need a compiler and linker and some more stuff because it's going to take the stanc generated C++ code and build the model specific executable".

For performance comparisons we currently need 2 "Stan Language" models, the 1 and 4 parameter LV models and corresponding sho functions. I believe everything else could be passed in. For a specific platform these could be turned into binaries as Elliot suggests.

Unfortunately this drops a far more important objective behind the work Chris and Vaibhav put into DiffEqBayes.jl to take a single ModelingToolkit model and transform it into a Turing model and into a Stan model.

On CI (if ~5 minutes is significant in the overall Github workflow time), I don't have a really good solution other then variations on below theme.

For folks using Stan and interested to run the test suite, they will have cmdstan installed. For (the majority of) folks not interested in using Stan maybe by-passing all Stan related steps if CMDSTAN and JULIA_CMDSTAN_HOME are not defined is an option?

goedman avatar Dec 16 '21 15:12 goedman

StanSample precompilation errors.

ChrisRackauckas avatar Jul 19 '22 14:07 ChrisRackauckas

I am merging so that the parameter estimation finally gets updated, but @Vaibhavdixit02 @ven-k please revive the Bayesian.

ChrisRackauckas avatar Aug 20 '22 05:08 ChrisRackauckas