llvm icon indicating copy to clipboard operation
llvm copied to clipboard

Compilation with -DVL=... crashes in type_list.hpp

Open kbobrovs opened this issue 4 years ago • 26 comments
trafficstars

This code in CL/sycl/detail/type_list.hpp template <access::address_space AS, typename VL> causes compilation error if user specifies -DVL=<some number> option on the command line. VL is a popular acronym for "vector length", people may use it as a macro in their programs.

Names which can clash with user space should start with underscore(s). E.g. template <access::address_space __AS, typename __VL>.

kbobrovs avatar Dec 31 '20 20:12 kbobrovs

Realted: DIM and NDIMS (from sycl/include/CL/sycl/detail/common.hpp) are also moderately common as macro names.

al42and avatar Jan 13 '21 10:01 al42and

All of this library code is part of the implementation, so I think all of the identifiers (aside from parts of the public API other than parameter names) should be in the reserved namespace, correct?

AaronBallman avatar Jan 14 '21 17:01 AaronBallman

All of this library code is part of the implementation, so I think all of the identifiers (aside from parts of the public API other than parameter names) should be in the reserved namespace, correct?

I think this is already the case (sycl::detail namespace is used), bit we can't protect template formal argument names with namespace from being redefined with #define or -D compiler switch.

kbobrovs avatar Jan 14 '21 17:01 kbobrovs

All of this library code is part of the implementation, so I think all of the identifiers (aside from parts of the public API other than parameter names) should be in the reserved namespace, correct?

I think this is already the case (sycl::detail namespace is used), bit we can't protect template formal argument names with namespace from being redefined with #define or -D compiler switch.

Namespaces don't solve the issue with macros though. The user can still #define or -D other identifiers that cause problems. e.g., (based on https://github.com/intel/llvm/blob/sycl/sycl/include/CL/sycl/detail/type_list.hpp#L42) they could -Dremainder=const and this would still break the header.

AaronBallman avatar Jan 14 '21 18:01 AaronBallman

All of this library code is part of the implementation, so I think all of the identifiers (aside from parts of the public API other than parameter names) should be in the reserved namespace, correct?

I think this is already the case (sycl::detail namespace is used), bit we can't protect template formal argument names with namespace from being redefined with #define or -D compiler switch.

Namespaces don't solve the issue with macros though. The user can still #define or -D other identifiers that cause problems. e.g., (based on https://github.com/intel/llvm/blob/sycl/sycl/include/CL/sycl/detail/type_list.hpp#L42) they could -Dremainder=const and this would still break the header.

Yes, that's exactly what I meant. So the suggested solution is to use __ prefix for template argument names. I think that's what other compiler vendors do (e.g. Microsoft). We can't protect all the identifiers that way obviously, as this would be impractical, but all-capital ones which are acronyms with high probability of redefinition can be protected.

kbobrovs avatar Jan 14 '21 18:01 kbobrovs

My suggestion is to use __ prefix for all names that are not exposed to the user as part of the public API. e.g., go with regular identifiers for classes, public class members, enumeration constants, etc but go with ugly names for template parameters, function parameters, private members of a class, implementation details, etc. This is a common strategy for other library code: https://github.com/llvm/llvm-project/blob/main/libcxx/include/string#L722

AaronBallman avatar Jan 14 '21 18:01 AaronBallman

Adding @bader.

kbobrovs avatar Jan 14 '21 19:01 kbobrovs

I might be wrong, but heard that libcxx is required to use __ prefix by C++ specification, which applies to rule to standard library implementations. User libraries are not required to follow this rule, but obviously library developers should think about how their libraries might be used. You can notice that other LLVM libraries do not use __ prefix - https://github.com/intel/llvm/blob/sycl/llvm/include/llvm/ADT/APFloat.h. A C++ compiler won't be able to compile this header with -DT=.... Although DPC++ library headers are distributed with the compiler, I'm not sure if we want to apply this requirement to our code base. It really hurts code readability.

Alternative solution might be to use CamelCase avoid conflicts with macro names which are usually use caps. For the example from the issue description:

template <access::address_space AddrSpace, typename VecLen>
template <access::address_space AddressSpace, typename VectorLength>
// or any other variant with lower case characters

bader avatar Jan 14 '21 19:01 bader

STL libraries are not required to do this as a matter of specification, but they do it as a matter of Quality of Implementation because of just how often user-defined macros conflict with library code in practice.

User library code like APFloat.h doesn't go to these lengths because they basically can't. Identifiers that start with double underscores (or single underscore followed by a capital letter) are reserved to the implementation, so using such an identifier in APFloat.h (or any other user header) would be undefined behavior.

However, in this case, the SYCL headers are a part of the compiler implementation itself. These reservations are specifically for folks in our situation so that we don't have to risk conflicting with user code. Personally, I don't think a few extra underscores impacts readability to the point where we should be concerned about it. Going to PascalCase for the identifiers doesn't really solve the problem -- the only way to safely do it is to make use of the implementation's reserved identifier space, because any user who has a macro that starts with double underscores (or single underscore followed by a capital letter) has undefined behavior, so they get whatever mess they get from it.

AaronBallman avatar Jan 14 '21 20:01 AaronBallman

FYI, I created a PR which change rules of identifier naming https://github.com/intel/llvm/pull/5843 .

HabKaffee avatar Mar 23 '22 10:03 HabKaffee

For the sake of reviving this discussion, I agree with @AaronBallman that the only way to be fully out of the woods is to use reserved identifiers in our headers, otherwise we are just making assumptions about user code which could easily land us back in this very same discussion again when someone has -DVecLen=... in their compilation. I don't think we should rule out code-bases using different coding standards, but if they start using reserved names we can at least point them to the C++ Standard.

As for how we would enforce it, clang-tidy seems to have a bugprone-reserved-identifier we may be able to use in its inverted state to root out current and future identifiers not following these restrictions. One problem with it is that our public interface would need to be in the exclusion list (as well as identifiers used in symbols of our ABI until ABI break is possible) which is likely to make the exclusion list quite large and slightly annoying to maintain. I wonder if we could somehow extend it to ignore public identifiers within a certain namespace or something along those lines. If this is what we decide, I am open to ideas on this.

@kbobrovs @bader @AaronBallman - Thoughts?

steffenlarsen avatar May 25 '22 16:05 steffenlarsen

Sounds good to me.

https://clang.llvm.org/extra/clang-tidy/#suppressing-undesired-diagnostics

If a specific suppression mechanism is not available for a certain warning, or its use is not desired for some reason, clang-tidy has a generic mechanism to suppress diagnostics using NOLINT, NOLINTNEXTLINE, and NOLINTBEGIN … NOLINTEND comments.

bader avatar May 25 '22 16:05 bader

I think this makes sense -- using clang-tidy and integrating that into the CI pipeline will help keep the headers hygienic for macros. Given that it will be a custom clang-tidy check, I think the idea of making it ignore public integers within a certain namespace is a good idea to keep the exclusions list down.

AaronBallman avatar May 25 '22 16:05 AaronBallman

During internal discussion the following concerns were brought up:

  1. Do internal macros also need to follow the reserved naming requirements? If yes, how can we be sure it does not interfere with other libraries?
  2. How do we guarantee that standard library #define using reserved names do not collide with the reserved identifiers we use?
  3. Why do libraries like Boost not use reserved identifiers?

steffenlarsen avatar May 25 '22 18:05 steffenlarsen

Do internal macros also need to follow the reserved naming requirements? If yes, how can we be sure it does not interfere with other libraries?

Yes; praying might do something. More seriously, you basically have to react to one another's macros.

How do we guarantee that standard library #define using reserved names do not collide with the reserved identifiers we use?

You basically have to react to one another's macros.

Why do libraries like Boost not use reserved identifiers?

“¯_(ツ)_/¯“

The reason it's important for you to do it with the SYCL headers is because those are being shipped as part of the "implementation" (which isn't just limited to the compiler; the C standard library is part of the implementation, the STL is part of the implementation, the linker is part of the implementation, and so on.). So you have to worry about basically everyone's code colliding with your headers, which is why you get to use identifiers "reserved to the implementation". That gives you at least a tiny bit of help with not colliding with user code (spoiler: you'll still collide with user code, but now you can tell them it's their bug not yours), but you still have to coordinate with basically everything else that's part of the implementation.

Welcome to being a system library! :-)

AaronBallman avatar May 25 '22 21:05 AaronBallman

I don't think C++ has enough tools to conveniently and reliably solve our problem here, so we need some compromise, I believe. For example

  • contrary to the LLVM coding guide, forbid the CamelStyle identifiers within functions in the API headers, and use lower-case always. Most macro names are all-caps like VL, so they won't conflict with parameter names or local variable names.
  • require template parameters to start with some prefix (__?) or have them all lower-case too

It is a good idea to require that all our PP macros have specific prefix (like __SYCL)

kbobrovs avatar May 26 '22 04:05 kbobrovs

Why do libraries like Boost not use reserved identifiers?

“¯(ツ)/¯“

The reason it's important for you to do it with the SYCL headers is because those are being shipped as part of the "implementation" (which isn't just limited to the compiler; the C standard library is part of the implementation, the STL is part of the implementation, the linker is part of the implementation, and so on.). So you have to worry about basically everyone's code colliding with your headers, which is why you get to use identifiers "reserved to the implementation". That gives you at least a tiny bit of help with not colliding with user code (spoiler: you'll still collide with user code, but now you can tell them it's their bug not yours), but you still have to coordinate with basically everything else that's part of the implementation.

Welcome to being a system library! :-)

How do you define the "implementation" and why Boost is treated as being not part of the "implementation"? What makes SYCL library a system library, but doesn't apply to the Boost library?

NOTE: we use Boost library code in SYCL headers - https://github.com/intel/llvm/pull/5791.

bader avatar May 26 '22 11:05 bader

How do you define the "implementation"

Colloquially, I define it as "everything that ships with the compiler or is necessary to make hello world work." e.g., everything in clang\lib\Headers is part of the implementation, the C standard library is part of the implementation even though it doesn't ship as part of Clang (needed for a hosted implementation, not needed for a freestanding one), the STL used is part of the implementation even though Clang multiple STLs, and so on.

Boost is not part of the implementation because it doesn't ship as part of the compiler. That we're bringing it into our SYCL headers means we're now taking responsibility for ensuring that works in all the situations we have to care about as a system library.

Our SYCL headers are part of the implementation because they're shipped with our compiler.

AaronBallman avatar May 26 '22 11:05 AaronBallman

  • contrary to the LLVM coding guide, forbid the CamelStyle identifiers within functions in the API headers, and use lower-case always. Most macro names are all-caps like VL, so they won't conflict with parameter names or local variable names.

This is not a valid assumption. There are plenty of macros which are not all caps (such as errno, setjmp, etc), so while this might save you some hassle, it's not a solution.

  • require template parameters to start with some prefix (__?) or have them all lower-case too

Using __ as a prefix is precisely my suggestion in https://github.com/intel/llvm/issues/2981#issuecomment-760354906 and would solve the issue in a defensible way. Do this for all your private identifiers in the header.

AaronBallman avatar May 26 '22 11:05 AaronBallman

This is not a valid assumption. There are plenty of macros which are not all caps (such as errno, setjmp, etc), so while this might save you some hassle, it's not a solution.

As I mentioned, I don't think we can have a solution. We can only have a mitigation. Using lower-case will reduce the chance of conflict significantly.

Do this for all your private identifiers in the header.

This seems an overkill for me. But, on the other hand, that's how e.g. Microsoft std C++ headers are implemented. Maybe we need to pay this price for safety.

kbobrovs avatar Jun 02 '22 03:06 kbobrovs

Using __ is a solution in the sense that it guarantees that all code works independent of what weird macros are defined. But the ugly identifiers clearly impact readability of the STL libraries and would do the same for the SYCL headers. And readability also has a user value (useful for e.g. debugging or understanding the inner working). I'm not convinced that supporting people doing weird stuff (all code guidelines say macros should be all caps) is good enough of a reason to impact readability. We could say (in spec or our implementation notes) that defining any macros which aren't all caps isn't supported. Then we just need to make sure we don't use any all-caps private identifiers (VL, DIM, ...) or are identifiers used by other system libraries (like errno). They wouldn't need to be camel or all-lower-case, just not all upper case.

That the SYCL headers are shipped with the compilers only means that we have the option to use __. Boost isn't allowed to use the reserved identified. But that doesn't mean we have to use the option. I think we should decidet based on whether we think readability or supporting weird use cases is more important to our users.

rolandschulz avatar Jun 02 '22 04:06 rolandschulz

The code review during up-streaming will do the final bike-shedding and it will be the same as the system headers and standard library. @rolandschulz program your editor to rewrite the identifiers as you want. :-)

keryell avatar Jun 02 '22 06:06 keryell

As I mentioned, I don't think we can have a solution. We can only have a mitigation. Using lower-case will reduce the chance of conflict significantly.

Correct, there are no standard solutions or common extensions that will save you from this problem. The only thing you can do is mitigate. Based on experience, using lower-case identifiers will slightly reduce the chance of conflict. But when the conflict happens, it's a bug in your header that you need to fix. You don't get to steal already-defined identifiers from the user just because they included your header; those identifiers are in the user's identifier space. As part of the implementation, you have your own identifier space (using reserved identifiers). Using that as your mitigation is the best you can do. Then when conflicts happen, you can legitimately close the bug as WONTFIX because the user caused that problem (most of the time, there's still conflicts between system headers of course).

This seems an overkill for me. But, on the other hand, that's how e.g. Microsoft std C++ headers are implemented. Maybe we need to pay this price for safety.

I believe you do. There's a reason basically every implementation header uses this approach.

Using __ is a solution in the sense that it guarantees that all code works independent of what weird macros are defined. ... I think we should decidet based on whether we think readability or supporting weird use cases is more important to our users.

I think this trivializes the bugs in your headers by focusing on something that should basically never be a concern when thinking about users. If you are expecting your users to read your headers to understand how they work, you have a documentation bug; you don't open up the STL headers to understand what std::accumulate does, you go to an online reference. And that's to the benefit of your users -- if they're reading your source code, that means they're more likely to ossify your implementation details by relying on them (and potentially making their code less portable as a side effect).

What's more, these are not "weird use cases", these are any identifier a user can use. If you used the identifier yourself, there's a very real chance it's a reasonable identifier for someone else to want to use. Further, it's faulty logic to assume that uppercase macros are a universal convention; there's plenty of evidence of that being false and so relying on it is building on an unstable foundation. If the code CAN be written by a user, the code WILL be written by a user (this is Hyrum's law) and at scale you will run into the issue at some point.

This isn't to say that there's no value to readability -- we obviously need to keep things readable for our own developers, and there are times users need to debug implementation headers. However, I'd argue that using reserved identifiers doesn't actually harm readability in a meaningful way (the identifiers are all still as clearly identifiable as they were without the underscores, so it's a new convention to get used to but not a concern with being able to uniquely identify something) and if it really is that big of a concern, you should be able to write a post-processing tool that turns unreserved identifiers into reserved identifiers to automatically rewrite a significant percentage of the identifiers before doing a release (I'm presuming there will be one-off situations that tooling can't figure out on its own and will need to be handled specially).

We could say (in spec or our implementation notes) that defining any macros which aren't all caps isn't supported.

How does that help for SYCL 1.2.1 and SYCL 2020, both of which are already released standards we're trying to support? (How does writing anything in our implementation notes help users to avoid pitfalls with our headers?)

AaronBallman avatar Jun 02 '22 11:06 AaronBallman

You don't get to steal already-defined identifiers from the user just because they included your header;

I think that's the key question whether that's true or not. I think we agreed that for C and C++ language headers the consensus is that this is true and for almost everything else it isn't. And I don't think it is obvious which precedence we should follow. This project can use reserved identifiers. But other SYCL implementation (e.g. hipSYCL or triSYCL) cannot because they are implemented as library and therefore can't use them. Therefore I don't think we can avoid to clarify the SYCL spec to allow users to write portable programs. Or can you (@keryell @illuhad) solve the problem for your implementations without a clarification which identifiers can't be set as macro prior to including the sycl header?

How does that help for SYCL 1.2.1 and SYCL 2020, both of which are already released standards we're trying to support?

1.2.1 doesn't matter because we aren't implanting it. For 2020 we are still publishing revisions with clarifications.

rolandschulz avatar Jun 06 '22 16:06 rolandschulz

In triSYCL I have not done anything for defensive programming. If I had to do it, I would just add __ in front of anything internal, starting with the template arguments since they are often uppercase. At the end it is not visually worse than some other coding standards adding m_ in front of member names, etc. Yes it is forbidden by the standard but if it works with any existing compiler, well...

keryell avatar Jun 11 '22 00:06 keryell

https://github.com/intel/llvm/pull/6493 - Small fix for a handful of known cases of commonly defined macro names. This is not intended as a resolution to this issue, but as a temporary work-around while we reach a conclusion.

steffenlarsen avatar Jul 29 '22 14:07 steffenlarsen

Patch https://github.com/intel/llvm/pull/6815 by @AlexeySachkov currently uses reserved identifiers in a detail header implementation. This may change to make it more consistent with the rest of the subproject, but mentioning it here seems appropriate, if nothing else but to illustrate the potential approach.

steffenlarsen avatar Oct 06 '22 11:10 steffenlarsen