stan [WIP] Allow NUTS to do eager evaluation on forward and backward trajectory in parallel

Submission Checklist

[ ] Run unit tests: ./runTests.py src/test/unit
[ ] Run cpplint: make cpplint
[ ] Declare copyright holder and open-source license: see below

Summary

This is a WIP to allow NUTS to perform calculation of the forward and backwards trajectory in parallel.

Intended Effect

How to Verify

Side Effects

Documentation

Copyright and Licensing

Please list the copyright holder for the work you are submitting (this will be you or your assignee, such as a university or company): Sebastian Weber and Steve Bronder

By submitting this pull request, the copyright holder is agreeing to license the submitted work under the following licenses:

Code: BSD 3-clause (https://opensource.org/licenses/BSD-3-Clause)
Documentation: CC-BY 4.0 (https://creativecommons.org/licenses/by/4.0/)

Feb 17 '22 20:02 SteveBronder

I think I did pencil the graph structure down somewhere…let me search for it.

Feb 17 '22 21:02 wds15

I only saw this in the email preview (in the same file as @SteveBronder is commenting on), I'd have coded this

fwd_idx == 0 ? true : valid_subtree_fwd[fwd_idx - 1];

as

fwd_idx == 0 || valid_subtree_fwd[fwd_idx - 1];

It's the same behavior because of the short-circuiting of ||.

Feb 17 '22 21:02 bob-carpenter

Here is what my iPad has in my notes app on this:

Screenshot 2022-02-18 at 14 33 36

(ignore for now the red arrows..I explain them below)

So you see the left and the right legs of the trajectory which have yellow path's. Each blob is a point on the trajectory where something happens. Specifically at these points the state is being send along the arrows to the middle process in black, which is the "abort checker process". The blue and green blobs fire off that we explore 2^n (n being the current depth) steps deeper the trajectory in that direction. The checker process in the middle recieves the states which must be compared to evaluate the U-turn criterion.

The idea is that you lay out the full tree to it's maximum depth first (putting lambda's into the leaves doing the work when called) and then you let the TBB run the tree for you in full automation. The execution is canceled once the U-turn criterion is met. At that stage execution of the tree is stopped and the identified state is returned.

If I recall correctly, the red connections are only wired up if you want a serial execution of things. So I intentionally wire-up the graph differently in the case of running this sequentially. This is to help the TBB figure out easier what to do when running the graph. I recall that this was good for single-core performance.

Ok... now that's what is behind this. Does this help already or should I try to annotate the source with reference to the figure?

Feb 18 '22 13:02 wds15

stan stan copied to clipboard

[WIP] Allow NUTS to do eager evaluation on forward and backward trajectory in parallel

Submission Checklist

Summary

Intended Effect

How to Verify

Side Effects

Documentation

Copyright and Licensing

stan
stan copied to clipboard