math Expanding 5 macros `STAN_ADD_REQUIRE

Summary

The mixed use of macros to generate a full set of template metaprograms is harder to read than having the template metaprograms in the code.

This PR expands the macros to the minimal set of macros required by the Math and Stan libraries.

Detail

There are 5 macros. I'll link to the latest released version (so the links are permanent), but they can be found on current develop. They are all inside stan/math/prim/meta/require_helpers.hpp:

STAN_ADD_REQUIRE_UNARY : https://github.com/stan-dev/math/blob/v4.7.0/stan/math/prim/meta/require_helpers.hpp#L72
STAN_ADD_REQUIRE_UNARY_INNER: https://github.com/stan-dev/math/blob/v4.7.0/stan/math/prim/meta/require_helpers.hpp#L114
STAN_ADD_REQUIRE_BINARY: https://github.com/stan-dev/math/blob/v4.7.0/stan/math/prim/meta/require_helpers.hpp#L186
STAN_ADD_REQUIRE_BINARY_INNER: https://github.com/stan-dev/math/blob/v4.7.0/stan/math/prim/meta/require_helpers.hpp#L229
STAN_ADD_REQUIRE_CONTAINER: https://github.com/stan-dev/math/blob/v4.7.0/stan/math/prim/meta/require_helpers.hpp#L332

All these macros, when used for every type, expands into a lot of definitions. Practically we don't use most of them.

Tests

None added.

Side Effects

None.

Release notes

Removes 5 macros and expands directly into code: STAN_ADD_REQUIRE_UNARY, STAN_ADD_REQUIRE_UNARY_INNER, STAN_ADD_REQUIRE_BINARY, STAN_ADD_REQUIRE_BINARY_INNER, STAN_ADD_REQUIRE_CONTAINER.

Checklist

[x] Copyright holder: Daniel Lee

The copyright holder is typically you or your assignee, such as a university or company. By submitting this pull request, the copyright holder is agreeing to the license the submitted work under the following licenses: - Code: BSD 3-clause (https://opensource.org/licenses/BSD-3-Clause) - Documentation: CC-BY 4.0 (https://creativecommons.org/licenses/by/4.0/)
[x] the basic tests are passing
- unit tests pass (to run, use: ./runTests.py test/unit)
- header checks pass, (make test-headers)
- dependencies checks pass, (make test-math-dependencies)
- docs build, (make doxygen)
- code passes the built in C++ standards checks (make cpplint)
[x] the code is written in idiomatic C++ and changes are documented in the doxygen
[x] the new changes are tested

Oct 22 '23 03:10 syclik

The way these were expanded deletes all the docs for requires from the stan math website

https://mc-stan.org/math/group__rev__row__vector__types_ga04400fc27cdcef7938e1a79d14068a7b.html#ga04400fc27cdcef7938e1a79d14068a7b

Oct 23 '23 14:10 SteveBronder

Yes. I know.

They can come back.

I’m looking at the next level of work right now which is to reduce the footprint of these requires. There’s no need to have the complete set if we’re not using most of it.

We can close this PR after it goes through tests. I wanted to make sure it actually works as intended first.

On Mon, Oct 23, 2023 at 10:56 AM Steve Bronder @.***> wrote:

The way these were expanded deletes all the docs for requires from the stan math website

https://mc-stan.org/math/group__rev__row__vector__types_ga04400fc27cdcef7938e1a79d14068a7b.html#ga04400fc27cdcef7938e1a79d14068a7b

— Reply to this email directly, view it on GitHub https://github.com/stan-dev/math/pull/2965#issuecomment-1775394578, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADH6FYJJXT5NCORSU7GBG3YA2ARVAVCNFSM6AAAAAA6KTCYHGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONZVGM4TINJXHA . You are receiving this because you authored the thread.Message ID: @.***>

Oct 23 '23 15:10 syclik

imo if we want to go this route I'd first see how many requires we can delete. Is this all the requires folded out? adding ~5k lines is not the worst, but it is a lot of replicated code over and over and I worry about the maintenance of it.

Oct 23 '23 15:10 SteveBronder

Yes, absolutely everything taken out of the macros! I’m on another branch seeing what can be removed.

I completely agree, but I think the actual footprint is going to be much less. Enough so that we can write real doc and add some tests for the things we actually use.

I submitted this as a PR to kick off testing. Since it’s so much stuff, I don’t trust “works on my machine.”

I’ll tag you once I knock out the unused requires. Let’s see how that looks.

On Mon, Oct 23, 2023 at 11:07 AM Steve Bronder @.***> wrote:

imo if we want to go this route I'd first see how many requires we can delete. Is this all the requires folded out? adding ~5k lines is not the worst, but it is a lot of replicated code over and over and I worry about the maintenance of it.

— Reply to this email directly, view it on GitHub https://github.com/stan-dev/math/pull/2965#issuecomment-1775419194, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADH6F2UZBWGCHG4WBASMGDYA2B4DAVCNFSM6AAAAAA6KTCYHGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONZVGQYTSMJZGQ . You are receiving this because you authored the thread.Message ID: @.***>

Oct 23 '23 15:10 syclik

@SteveBronder, I just walked through all 1108 require_* template metaprograms. Good news, we only use 201 of them. (I think some of those are used by the other definitions and that's it. We may be to reduce the footprint even more.)

Oct 23 '23 23:10 syclik

@SteveBronder, review please?

I went through and also included the documentation from the macros, so the doxygen should be repopulated. I just merged develop back into the branch to kick off all tests again.

Jan 11 '24 14:01 syclik

My concern is that the downside to defining manually exactly what we use today is it raises the barrier for future contributions. If I know the require I need is require_all_not_arena_matrix (random example), today it exists. After this PR, I would need to first write it myself, and these things are far from obvious. The macros also make it very difficult to incorrectly implement something like require_all_not_..., since they let you focus on the simplest case.

This is partially offset to me if this has a significant impact on compile times - have you done any timing @syclik?

Jan 11 '24 14:01 WardBrian

@WardBrian, I hear you! But I think the example you're bringing up highlights the opposite: I think the change lowers the barrier for future contributions. Btw, I think both stances are not unilaterally correct or incorrect and they are opinions, so at the end we can have different views on it. I'll just try to motivate why I think it's actually going to lower the barrier to entry.

First, the (random) example you picked is actually not one of the expanded metaprograms (if the macro was called correctly)! This stuff is really tricky and not having the actual definitions sitting in code makes it very difficult to know what does and doesn't exist. (It should be require_all_not_arena_matrix_t, it was missing the _t at the end; I'm not mentioning this to be nitpicky, I'm just agreeing with you that this is really exacting work.) In the current code, the way you'd have to realize that is by looking in two different places: the STAN_ADD_REQUIRE_*() macro call and the macro itself. Just to look at an existing example, let's look at require_all_not_var_t.

How do you know which STAN_ADD_REQUIRE_*() macro expands into this if you're coming into the codebase to contribute? It doesn't show up when you use tools like git grep:

$ git grep require_all_not_var_t
stan/math/opencl/prim/multiply.hpp:          require_all_not_var_t<T_a, T_b>* = nullptr>
stan/math/opencl/prim/multiply.hpp:          require_all_not_var_t<T_a, T_b>* = nullptr>
stan/math/prim/fun/add.hpp:          require_all_not_var_t<ScalarA, ScalarB>* = nullptr>
stan/math/prim/fun/dot_product.hpp:          require_all_not_var_t<Scalar1, Scalar2> * = nullptr>
stan/math/prim/fun/squared_distance.hpp:          require_all_not_var_t<Scal1, Scal2>* = nullptr>
stan/math/prim/fun/subtract.hpp:          require_all_not_var_t<ScalarA, ScalarB>* = nullptr>
test/unit/math/rev/meta/require_generics_test.cpp:  require_scal_checker<stan::require_all_not_var_t, var, var>::all_not();

For someone new, it'd be a lot of detective work to understand that it's here: https://github.com/stan-dev/math/blob/v4.8.0-rc1/stan/math/prim/meta/is_var.hpp#L16. Not impossible, but not straightforward. STAN_ADD_REQUIRE_UNARY(var, is_var, require_stan_scalar_real);

What does this expand to? You need to look at this: https://github.com/stan-dev/math/blob/v4.8.0-rc1/stan/math/prim/meta/require_helpers.hpp#L72. And I don't know about you, but I had to look up the doc in C++ preprocessor expansion to even comprehend this chunk of code (I'm pasting here for emphasis):

#define STAN_ADD_REQUIRE_UNARY(check_type, checker, doxygen_group)       \
  /*! \ingroup doxygen_group */                                          \
  /*! \defgroup check_type##_types check_type  */                        \
  /*! \addtogroup check_type##_types */                                  \
  /*! @{ */                                                              \
  /*! \brief Require type satisfies checker */                           \
  template <typename T>                                                  \
  using require_##check_type##_t = require_t<checker<std::decay_t<T>>>;  \
                                                                         \
  /*! \brief Require type does not satisfy checker */                    \
  template <typename T>                                                  \
  using require_not_##check_type##_t                                     \
      = require_not_t<checker<std::decay_t<T>>>;                         \
                                                                         \
  /*! \brief Require all of the types satisfy checker */                 \
  template <typename... Types>                                           \
  using require_all_##check_type##_t                                     \
      = require_all_t<checker<std::decay_t<Types>>...>;                  \
                                                                         \
  /*! \brief Require any of the types satisfy checker */                 \
  template <typename... Types>                                           \
  using require_any_##check_type##_t                                     \
      = require_any_t<checker<std::decay_t<Types>>...>;                  \
                                                                         \
  /*! \brief Require none of the types satisfy checker */                \
  template <typename... Types>                                           \
  using require_all_not_##check_type##_t                                 \
      = require_all_not_t<checker<std::decay_t<Types>>...>;              \
                                                                         \
  /*! \brief Require at least one of the types do not satisfy checker */ \
  template <typename... Types>                                           \
  using require_any_not_##check_type##_t                                 \
      = require_any_not_t<checker<std::decay_t<Types>>...>;              \
/*! @} */

Even if you did read C++ preprocessor macros fluently, you'd have to find the right definition inside the chunk above.

Please compare that to what's in the PR branch https://github.com/stan-dev/math/blob/feature/simplify-meta/stan/math/prim/meta/is_var.hpp#L37:

/*! \brief Require none of the types satisfy is_var */
template <typename... Types>
using require_all_not_var_t = require_all_not_t<is_var<std::decay_t<Types>>...>;

It also shows up in tools like git grep. I don't know about you, but for me it seems like it's a lot easier to understand what's going on and contribute when it's laid out this way.

Note: did you catch that the std::decay_t inside the definition is superfluous? I wouldn't have been able to catch that without the expansion.

After this PR, I would need to first write it myself, and these things are far from obvious.

Yes, I recognize this is a downside. But here are a few counterpoints:

expanding the macros actually clarifies what's happening. Even for C++-fluent people, I think having to figure out how things expand through the C++ preprocessor macros makes the macros even more opaque.
each of the expansions is used in one place or another. There are examples for how to expand the requires traits. And besides, we can always go and look at the macros in the older tagged version. (I honestly don't think they're all used correctly, but that's another issue.)
testing; expanding the macros allows for us to actually write tests. We could have written tests for these with the macros in place, but about 80% of the macro expansions aren't used at all! We're down to ~200 traits in use. We can actually write tests for those. I plan on adding them slowly, especially for the ones that are implemented incorrectly.

When I think about these traits, I think they should be as obvious as we can make them. I think the expansion makes it more obvious. Using your example of require_all_not_arena_matrix_t, here's how it would be implemented:

/*! \brief Require none of the types satisfy is_arena_matrix */
template <typename... Types>
using require_all_not_arena_matrix_t = require_all_not_t<is_arena_matrix<std::decay_t<Types>>...>;

I don't know about you, but that looks a lot more friendly than looking at the macro.

The macros also make it very difficult to incorrectly implement something like require_all_not_..., since they let you focus on the simplest case.

I have a different opinion here. I'm looking at the macro expansions and can now see that some of the is_* traits are implemented with and without decays. The macros all decay types.

To me, this seems inconsistent and it's indicating that our traits aren't implemented correctly. (They may have the correct behavior, but because we're just decaying multiple times when it's unnecessary. Or maybe it is.)

All I'm trying to say is that the macros themselves may be incorrect and it's hard to correct them the way it is. Especially without testing. So I think expanding them is actually going to improve our chances of implementing something new correctly. (But this last part is conjecture. At the very least, I'd know how to review the traits in a PR.)

One last point: the subset of developers that will write new traits at this level is small. The subset of developers that use the traits is larger than those that create these traits. Almost everyone that develops in Math has to touch these traits these days. I think making it clearer what's going on is helpful to the larger group. My opinion is that it's easier to understand when the definitions of the template metaprograms are spelled out.

If a Math developer really needed a definition that didn't exist, one of us could write it. (@WardBrian, if I asked you to write require_any_not_vt_arena_matrix, would you be able to piece it together from the code on the branch? What if I asked you what the definition looks like in develop now? Which is easier?)

In the long run, it would help out the Math library if more developers could understand and contribute. So I think lowering the barrier to entry is helpful. (More code is bad, but templated code where the definitions don't exist directly seem harder to understand?)

Jan 12 '24 15:01 syclik

This is partially offset to me if this has a significant impact on compile times - have you done any timing @syclik?

No, but we can look at Jenkins! I didn't really think about that too much. Figured it was negligible, but maybe I'm wrong.

Jan 12 '24 15:01 syclik

How do you know which STAN_ADD_REQUIRE_*() macro expands into this if you're coming into the codebase to contribute? It doesn't show up when you use tools like git grep:

I think reading your reply I realized my different opinion is likely a result of a different workflow. I use a pretty highly configured VSCode instance for C++, so for me, I just type require_ and wait for a suggestion list of every possible require template to show up. I can then fuzzy-find on that suggestion list, so my actual keyboard inputs are probably more like require_ [wait] arena <arrow keys> <TAB>.

To me, the hardest part about this was actually knowing what the names of things mean: what is a "value type" versus a "scalar type", etc. I even wrote a program to help remind me: https://gist.github.com/WardBrian/85ca3a4bfb3a23f55ef40604987c04eb

I basically never have needed to dig into where these are defined, I was making my selection purely based off the name and the online doc's general guidance. But, when I did, VSCode's "go-to definition" feature actually does manage to point you to the specific macro instantiation.

I can see how a different workflow (say, emacs and git grep) would be much more painful with the current set up. And, VSCode would still work just as well on this branch, there are just fewer suggestions that will show up...

I ultimately agree that this is reasonable territory to 'agree to disagree' if needed. I certainly wouldn't lose any sleep if this was merged as-is, I just wanted to have this back and forth to make sure we considered both sides

My workflow above of course assumes correct implementations of those options I'm picking off my menu! So if they're not actually correct, that is obviously a stronger argument than anything. Testing these is a huge pain in my experience, since it's hard to tell exactly what the compiler selected.

I'd be super interested in what tests you have in mind look like, and also if you know of any examples where the current branch yields differrent(/better) behavior than master

@WardBrian, if I asked you to write require_any_not_vt_arena_matrix, would you be able to piece it together from the code on the branch?

I think I could, and I agree it would probably be easier than reverse-engineering the macros. The issue is, if I (personally) do need to write one of these, the chance is very strong that I would actually need it for work either in stan-dev/stan or the compiler. So it would be an extra Math PR for work really happening elsewhere. Not the end of the world, of course

On compilation speed, it looks like the expression tests may be faster on this branch than others? But most other things look similar, so it's hard to say

Jan 12 '24 16:01 WardBrian

@WardBrian, thanks for your comments! And recognizing that at least part of the difference in perceived cognitive load has to do with the tooling.

I am 100% with you on the scalar_type vs value_type. That was one of the first reasons I started digging into this. I wanted to know what one of the _vt was actually returning as a type and I couldn't figure it out by looking directly at the code. Even with the code expanded it's a complicated thing, but for me it's a lot harder to piece together what's going on with it inside the macros.

If we look at this another way, with the PR we would have added one more dev that can understand these meta programs and is willing to help make them better. Hopefully this means other devs can help too.

Worst case, it adds some code. (When it passes all tests, all functionality is identical pre and post.) If in a while we determine this is worse, we can bring all the macros back. It's not an irreversible change.

Jan 12 '24 21:01 syclik

I'd be super interested in what tests you have in mind look like, and also if you know of any examples where the current branch yields differrent(/better) behavior than master

Yeah! I think we can just write unit tests for the meta programs. We have some unit tests inside something like test/unit/math/prim/is_var.hpp, but only for is_var.

We test requires_var_t using different types with an std::enable_if or something like that to have it return true type or false type.

if you know of any examples where the current branch yields differrent(/better) behavior than master

I don't know any off the top of my head, but I do know that I'd implement things differently. One example is require_var_t:

template <typename T>
using require_var_t = require_t<is_var<std::decay_t<T>>>;

if you look at the template specialization for is_var for var, it looks like this:

template <typename T>
struct is_var<T,
              std::enable_if_t<internal::is_var_impl<std::decay_t<T>>::value>>
    : std::true_type {};

Notice there's a std::decay_t<T> inside there? So we've decayed the original type T twice. An update could be:

template <typename T>
using require_var_t = require_t<is_var<T>>;

Simpler and easier to understand. There's nothing wrong with the original implementation, but for implementational correctness, you wouldn't let this go, right?

template <typename T>
using require_var_t = require_t<is_var<std::decay<std::decay<T>>>>;

It does the same thing, but I would review it and say it's not implemented correctly. (Just as I probably wouldn't take double x = static_cast<double>(5.0);)

Now the tricky part: why can't we just update the macro? That would break things. There are some is_* template metaprograms that don't decay. Are those implemented correctly? I don't know. I want to find out.t There aren't tests to back up the current behavior. We'd also have to figure out which ones are in use to test against.

Anyway, when I see myself trying to figure out whether I can get to the bottom of whether we're implementing it correctly, I can't really figure out how to do it through the macros. (And that might just be me, but I don't think I'm unique in thinking it's hard to understand what the requires template metaprograms do without the definitions.)

Jan 12 '24 21:01 syclik

Performing multiple decays seems pretty expected when you're composing things like this, and should also be harmless. Same result, and no extra runtime cost since this is all compile time anyway. Whether or not the is_* metaprograms that don't perform the decay themselves are correct is worth exploring, but seems orthogonal to the use of macros.

I think asking a macro to generate exactly the code you'd write yourself is probably too high a bar, as long as it is ultimately equivalent. I would probably also raise my eyebrows at a static cast to the same type in a PR, but I could also imagine it making total sense in a macro body, depending.

Jan 12 '24 22:01 WardBrian

:)

I feel like we'll have different opinions here and that's ok.

At the end of the day, if you won't lose sleep over the change, I think it's worth a shot. If it turns out to be worse, I'll be the first to offer to revert.

Jan 12 '24 23:01 syclik

I can agree to that.

We should also add to the page explaining these requires (or maybe a new page?) general guidelines for writing a new one: what to name it, general conventions, existing helpers you can piece together, etc.

Jan 13 '24 17:01 WardBrian

Thanks. This PR is ready (pending tests; I just merged develop post release).

This is the page I'll be updating in future PRs: https://github.com/stan-dev/math/blob/feature/simplify-meta/doxygen/contributor_help_pages/require_meta.md

That file was modified in this PR to remove the wording about the macros. It was light on doc to begin with. As I keep working with stan/meta, I'll definitely document and catalog the traits. (I need to stare at the definitions to really know how to document it in a consistent way.)

Jan 17 '24 03:01 syclik

Sorry for my delay.

Just to write down the historical context. The reason we had all these macros was because we were adding a lot of code for the new matrix type across the different stan libraries. At first we were doing what we did here and added requires as we needed them. But when we started working up in the stan library and needed a new requires that means we would need to open a math PR and wait for it to merge before we could merge things in the stan repo.

That was not fun, so instead we wrote these macros that just generated all of the possible requires for us for each is_ in the meta folder. Our style followed much more like @WardBrian where tab completion + docs were good for our workflow. But I can also understand how the macros are bad for coding styles that depend on needing to view the actual code.

With that all being said, I think things are in a much more stable place where a PR like this makes sense. Though I have a few reservations I wouldn't call them blocking.

My only main thing is If we are going to limit the scope down then I think we need to add docs to explain to folks the idioms better and how to add their own require. I think that should be fairly simple

It was light on doc to begin with. As I keep working with stan/meta, I'll definitely document and catalog the traits. (I need to stare at the definitions to really know how to document it in a consistent way.)

What makes the documentation feel light to you? I tried being pretty verbose with how the system worked, how they were named, and showing how to search for the requires you need on the stan math docs. Including breaking up the requires into what I thought were meaningful categories for people to search through

Now the tricky part: why can't we just update the macro? That would break things.

Did you do this and it broke things? My guess is that the things breaking are the is_* that are not decaying. Which is also probably the reason we had the decay there in the first place.

It does the same thing, but I would review it and say it's not implemented correctly. (Just as I probably wouldn't take double x = static_cast(5.0);)

I think more accurate code to what's going on there would be something more complicated like

// f.hpp
tempalte <typename T>
double f(T x) {
  return x;
}

// many/many/files/away/g.hpp
template <typename T>
double g(T x) {
  return static_cast<double>(f(x));
}

Doing composition of several functions can often lead to duplicated checks like this. Imo if I caught it and contextually understood they don't need the cast I'd ask them to remove it. But idt it's a big deal

Jan 18 '24 17:01 SteveBronder

@SteveBronder, thanks for the comments!

Just to write down the historical context...

Thanks! A lot of that resonates. I looked back through the history of the codebase to see how it was before the macros were introduced. And I completely understand the convenience of having the macros in that development cycle. And I'm with you here: "With that all being said, I think things are in a much more stable place where a PR like this makes sense."

I'm sure as it was being developed, we had thoughts of using all the generated requires; or at least having them all were really convenient.

My only main thing is If we are going to limit the scope down then I think we need to add docs to explain to folks the idioms better and how to add their own require. I think that should be fairly simple

Agreed. Right now, the doc is written more in line with this than the macros.

What makes the documentation feel light to you? I tried being pretty verbose with how the system worked, how they were named, and showing how to search for the requires you need on the stan math docs. Including breaking up the requires into what I thought were meaningful categories for people to search through

Ah. Maybe I'm missing something. The main page I'm looking at is this: https://mc-stan.org/math/md_doxygen_2contributor__help__pages_2require__meta.html, specifically the "Adding a new requires" section.

In this doc, I don't see things like:

When should I use the different macros? Two of the macros are mentioned out of the five.
What does each of the macros expand out to? My best guess is that you're expecting the dev to read the macro at this point. In the section before, it explains what STAN_ADD_REQUIRES expands out to, but in that section it doesn't mention that macro at all.
Doc for something like scalar_type make sense when you already know what this means. (For me, I need to get to the definitions to understand what scalar type and value types are still -- that's on me. But the doc isn't helpful for me understanding that.) It could help to just have examples in the doc (for the future).

Please let me know if I've missed some doc about the macros! I didn't see it in the code itself and I couldn't find much more in the doxygen folder. Maybe I was searching for the wrong keywords.

Did you do this and it broke things? My guess is that the things breaking are the is_* that are not decaying. Which is also probably the reason we had the decay there in the first place.

No, I didn't. I'm still trying to limit the scope of this PR to a refactor as opposed to features. That said, what you said resonates with me.

It feels like the requires_*_t should pass the type without decaying and have the bool trait determine whether the condition is met based on how it deals with types. Right now if we wanted to have a bool trait make a decision on a non-decayed type, we can't do it using the macro. So it seems a little off, but I'm not trying to fix anything right now. That sort of change should come later where we review cases carefully.

Jan 19 '24 15:01 syclik

@SteveBronder, what's required in this PR to have it pass a review?

Separately, please do list any reservations! Happy to work through those and make this code base better.

Jan 19 '24 15:01 syclik

I am still interested in establishing if there exists a program which has different behavior on this branch than on the current state of develop. It was implied earlier that these exist and that when they do, this branch has the "better" behavior, but it's all been abstract so far.

This could be as simple as a unit test added to this branch which, when cherry-picked into develop, fails

Jan 19 '24 19:01 WardBrian

I am still interested in establishing if there exists a program which has different behavior on this branch than on the current state of develop.

Oh no! There's a misunderstanding.

There is no behavior change in this PR. All the template metaprograms that were added directly here was what the C++ preprocess would have expanded, with no modification!

Any potential modification would happen after this PR with individual unit tests to make sure we understand the difference in behaviors. That sort of change is not part of this PR.

Jan 19 '24 20:01 syclik

Ah, I did misunderstand. While I am generally in favor of smaller PRs, because I am sort of dubious on the merits of this change for its own sake, I would be interested in seeing at least some of the improved doc or improved behavior be realized before merging. Happy to be overruled by @SteveBronder if he disagrees

Jan 19 '24 21:01 WardBrian

In this doc, I don't see things like:

When should I use the different macros? Two of the macros are mentioned out of the five.

What does each of the macros expand out to? My best guess is that you're expecting the dev to read the macro at this point. In the section before, it explains what STAN_ADD_REQUIRES expands out to, but in that section it doesn't mention that macro at all.

Doc for something like scalar_type make sense when you already know what this means. (For me, I need to get to the definitions to understand what scalar type and value types are still -- that's on me. But the doc isn't helpful for me understanding that.) It could help to just have examples in the doc (for the future).

Hi @syclik, I agree that this lowers the barrier to entry. At least, it helps me to understand what's going on. Are these doc updates part of this PR or in a different PR. I support adding this additional information asap along with adding this pr.

Jan 23 '24 16:01 spinkney

@SteveBronder, mind if we get on a call and talk through this PR?

Jan 23 '24 16:01 syclik

Thanks, @SteveBronder. I'll work on that and ping you when it's all addressed. Thanks for leaving the detailed comments; I think they're clear enough to propagate those changes. I'll let you know if I run into any issues with it!

Jan 30 '24 22:01 syclik

Name	Old Result	New Result	Ratio	Performance change( 1 - new / old )
arma/arma.stan	0.19	0.18	1.06	5.44% faster
low_dim_corr_gauss/low_dim_corr_gauss.stan	0.01	0.01	0.98	-2.1% slower
gp_regr/gen_gp_data.stan	0.02	0.02	1.07	6.17% faster
gp_regr/gp_regr.stan	0.11	0.1	1.06	5.68% faster
sir/sir.stan	78.06	75.75	1.03	2.96% faster
irt_2pl/irt_2pl.stan	3.89	3.92	0.99	-0.93% slower
eight_schools/eight_schools.stan	0.05	0.06	0.97	-3.62% slower
pkpd/sim_one_comp_mm_elim_abs.stan	0.25	0.25	1.02	2.01% faster
pkpd/one_comp_mm_elim_abs.stan	17.78	18.16	0.98	-2.14% slower
garch/garch.stan	0.44	0.45	0.97	-3.01% slower
low_dim_gauss_mix/low_dim_gauss_mix.stan	2.75	2.83	0.97	-2.96% slower
arK/arK.stan	1.6	1.63	0.98	-2.02% slower
gp_pois_regr/gp_pois_regr.stan	2.5	2.54	0.98	-1.6% slower
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan	9.01	9.16	0.98	-1.73% slower
performance.compilation	168.78	170.28	0.99	-0.89% slower
Mean result: 1.001974611880138

Jenkins Console Log Blue Ocean Commit hash: d1c6b7c740d486cba81292100a0c469eb4b6b4cb

Machine information

No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 20.04.3 LTS Release: 20.04 Codename: focal

CPU: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian Address sizes: 46 bits physical, 48 bits virtual CPU(s): 80 On-line CPU(s) list: 0-79 Thread(s) per core: 2 Core(s) per socket: 20 Socket(s): 2 NUMA node(s): 2 Vendor ID: GenuineIntel CPU family: 6 Model: 85 Model name: Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz Stepping: 4 CPU MHz: 2400.000 CPU max MHz: 3700.0000 CPU min MHz: 1000.0000 BogoMIPS: 4800.00 Virtualization: VT-x L1d cache: 1.3 MiB L1i cache: 1.3 MiB L2 cache: 40 MiB L3 cache: 55 MiB NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,60,62,64,66,68,70,72,74,76,78 NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47,49,51,53,55,57,59,61,63,65,67,69,71,73,75,77,79 Vulnerability Gather data sampling: Mitigation; Microcode Vulnerability Itlb multihit: KVM: Mitigation: VMX disabled Vulnerability L1tf: Mitigation; PTE Inversion; VMX conditional cache flushes, SMT vulnerable Vulnerability Mds: Mitigation; Clear CPU buffers; SMT vulnerable Vulnerability Meltdown: Mitigation; PTI Vulnerability Mmio stale data: Mitigation; Clear CPU buffers; SMT vulnerable Vulnerability Retbleed: Mitigation; IBRS Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Vulnerability Spectre v2: Mitigation; IBRS, IBPB conditional, STIBP conditional, RSB filling, PBRSB-eIBRS Not affected Vulnerability Srbds: Not affected Vulnerability Tsx async abort: Mitigation; Clear CPU buffers; SMT vulnerable Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single pti intel_ppin ssbd mba ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts hwp hwp_act_window hwp_epp hwp_pkg_req pku ospke md_clear flush_l1d arch_capabilities

G++: g++ (Ubuntu 9.4.0-1ubuntu1~20.04) 9.4.0 Copyright (C) 2019 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Clang: clang version 10.0.0-4ubuntu1 Target: x86_64-pc-linux-gnu Thread model: posix InstalledDir: /usr/bin

Feb 23 '24 11:02 stan-buildbot

@SteveBronder, I updated everything!

Please check out the updated doxygen/contributor_help_pages/require_meta.md. I could use your eyes on it:
1. I kept the bulk of the original doc, but moved it towards the middle of the doc.
2. I added an intro that I hope explains what we're trying to accomplish. At least at a high level and correct for most of the cases.
3. I enumerated all the macro expands. I could really use your eyes there on the correct description of the _vt and _st traits with the logical qualifiers. I was using what was written in the code documentation to guide me, but I didn't dig enough to validate whether the doc was actually correct.
I've gone through and added all the template parameter doc that was missing! Thanks for adding the doc in the PR comment. I used a version of that and it made it a whole lot easier.
The comment about binary type traits -- we don't use any of them! Just FYI.
Lastly, require_vt. I left a comment, but wanted to reiterate. I tried implementing both require_vt and require_st. It got a bit messy with the template parameter packs. I'd rather add that as a separate PR with that change isolated. I think we should do it. If you're able to implement it, happy to include it with this PR. If you're tied up, I can try to tackle it in a different pass (where I can set up some tests and make sure I'm not going to mess that up somehow).

Feb 24 '24 03:02 syclik

Name	Old Result	New Result	Ratio	Performance change( 1 - new / old )
eight_schools/eight_schools.stan	0.06	0.06	1.01	1.33% faster
pkpd/one_comp_mm_elim_abs.stan	19.87	19.91	1.0	-0.22% slower
pkpd/sim_one_comp_mm_elim_abs.stan	0.37	0.27	1.37	27.11% faster
arK/arK.stan	1.75	1.81	0.97	-3.16% slower
gp_pois_regr/gp_pois_regr.stan	2.82	2.84	0.99	-0.97% slower
gp_regr/gen_gp_data.stan	0.03	0.03	1.01	0.82% faster
gp_regr/gp_regr.stan	0.12	0.12	0.98	-2.17% slower
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan	9.82	10.15	0.97	-3.34% slower
low_dim_gauss_mix/low_dim_gauss_mix.stan	2.98	3.04	0.98	-2.25% slower
low_dim_corr_gauss/low_dim_corr_gauss.stan	0.01	0.01	1.02	1.52% faster
garch/garch.stan	0.54	0.55	0.98	-1.54% slower
arma/arma.stan	0.23	0.23	1.01	0.59% faster
irt_2pl/irt_2pl.stan	5.6	4.63	1.21	17.37% faster
sir/sir.stan	83.92	84.02	1.0	-0.12% slower
performance.compilation	189.43	193.28	0.98	-2.03% slower
Mean result: 1.0313868307022471

Jenkins Console Log Blue Ocean Commit hash: f70fb84db501edcbf822691c02664c43b76287b2

Machine information

No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 20.04.3 LTS Release: 20.04 Codename: focal

CPU: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian Address sizes: 46 bits physical, 48 bits virtual CPU(s): 80 On-line CPU(s) list: 0-79 Thread(s) per core: 2 Core(s) per socket: 20 Socket(s): 2 NUMA node(s): 2 Vendor ID: GenuineIntel CPU family: 6 Model: 85 Model name: Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz Stepping: 4 CPU MHz: 3700.000 CPU max MHz: 3700.0000 CPU min MHz: 1000.0000 BogoMIPS: 4800.00 Virtualization: VT-x L1d cache: 1.3 MiB L1i cache: 1.3 MiB L2 cache: 40 MiB L3 cache: 55 MiB NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,60,62,64,66,68,70,72,74,76,78 NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47,49,51,53,55,57,59,61,63,65,67,69,71,73,75,77,79 Vulnerability Itlb multihit: KVM: Mitigation: Split huge pages Vulnerability L1tf: Mitigation; PTE Inversion; VMX conditional cache flushes, SMT vulnerable Vulnerability Mds: Mitigation; Clear CPU buffers; SMT vulnerable Vulnerability Meltdown: Mitigation; PTI Vulnerability Mmio stale data: Mitigation; Clear CPU buffers; SMT vulnerable Vulnerability Retbleed: Mitigation; IBRS Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Vulnerability Spectre v2: Mitigation; IBRS, IBPB conditional, RSB filling, PBRSB-eIBRS Not affected Vulnerability Srbds: Not affected Vulnerability Tsx async abort: Mitigation; Clear CPU buffers; SMT vulnerable Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single pti intel_ppin ssbd mba ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts pku ospke md_clear flush_l1d arch_capabilities

G++: g++ (Ubuntu 9.4.0-1ubuntu1~20.04) 9.4.0 Copyright (C) 2019 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Clang: clang version 10.0.0-4ubuntu1 Target: x86_64-pc-linux-gnu Thread model: posix InstalledDir: /usr/bin

Feb 27 '24 23:02 stan-buildbot

Awesome! I will start looking at this on Friday. I'll take a crack at require_vt and require_st.

Feb 28 '24 20:02 SteveBronder

@SteveBronder, I think the doc in doxygen/contributor_help_pages/require_meta.md is much better now!! (I don't think it's perfect, but I think it documents the non-type template parameter much clearer than it's been written about before.)

Mind taking a look? If you have any suggestions on how to make that doc better, happy to give it a try.

Mar 24 '24 03:03 syclik

Expanding 5 macros `STAN_ADD_REQUIRE_*` directly into code.

Summary

Detail

Tests

Side Effects

Release notes

Checklist