stan
stan copied to clipboard
[WIP] Allow NUTS to do eager evaluation on forward and backward trajectory in parallel
Submission Checklist
- [ ] Run unit tests:
./runTests.py src/test/unit - [ ] Run cpplint:
make cpplint - [ ] Declare copyright holder and open-source license: see below
Summary
This is a WIP to allow NUTS to perform calculation of the forward and backwards trajectory in parallel.
Intended Effect
How to Verify
Side Effects
Documentation
Copyright and Licensing
Please list the copyright holder for the work you are submitting (this will be you or your assignee, such as a university or company): Sebastian Weber and Steve Bronder
By submitting this pull request, the copyright holder is agreeing to license the submitted work under the following licenses:
- Code: BSD 3-clause (https://opensource.org/licenses/BSD-3-Clause)
- Documentation: CC-BY 4.0 (https://creativecommons.org/licenses/by/4.0/)
I think I did pencil the graph structure down somewhere…let me search for it.
I only saw this in the email preview (in the same file as @SteveBronder is commenting on), I'd have coded this
fwd_idx == 0 ? true : valid_subtree_fwd[fwd_idx - 1];
as
fwd_idx == 0 || valid_subtree_fwd[fwd_idx - 1];
It's the same behavior because of the short-circuiting of ||.
Here is what my iPad has in my notes app on this:

(ignore for now the red arrows..I explain them below)
So you see the left and the right legs of the trajectory which have yellow path's. Each blob is a point on the trajectory where something happens. Specifically at these points the state is being send along the arrows to the middle process in black, which is the "abort checker process". The blue and green blobs fire off that we explore 2^n (n being the current depth) steps deeper the trajectory in that direction. The checker process in the middle recieves the states which must be compared to evaluate the U-turn criterion.
The idea is that you lay out the full tree to it's maximum depth first (putting lambda's into the leaves doing the work when called) and then you let the TBB run the tree for you in full automation. The execution is canceled once the U-turn criterion is met. At that stage execution of the tree is stopped and the identified state is returned.
If I recall correctly, the red connections are only wired up if you want a serial execution of things. So I intentionally wire-up the graph differently in the case of running this sequentially. This is to help the TBB figure out easier what to do when running the graph. I recall that this was good for single-core performance.
Ok... now that's what is behind this. Does this help already or should I try to annotate the source with reference to the figure?