cudf
cudf copied to clipboard
Automate include grouping order in .clang-format
Replaces #14993
Description
This uses the IncludeCategories
settings in .clang-format to attempt to enforce our documented #include
order in libcudf. See https://docs.rapids.ai/api/libcudf/stable/developer_guide
I realize that there was a previous attempt at this by @bdice that met with some resistance. Reading it, I wouldn't say it was vetoed; rather, reviewers requested something much simpler. I have a few reasons to attempt this again.
- To make a separate task much easier. We are undertaking a refactoring of RMM that will replace
rmm::mr::device_memory_resource*
withrmm::device_async_resource-ref
everywhere in RAPIDS (not just cuDF). This requires adding an include to MANY files. Getting the location of the include right everywhere is very difficult without automatic grouping of headers. I started out writing a bash script to do this before realizing clang-format has the necessary feature. And I realized that my script would never properly handle files like this. - To increase velocity. Everywhere in RAPIDS that we have automated code standard/style/formatting/other, the benefits to velocity have outweighed the costs. To paraphrase @bdice, $auto \nearrow \rightarrow \mu \searrow \rightarrow v \nearrow$
- The previous PR #12760 had nearly 50 categories of headers. There was no way this could be applied universally across RAPIDS repos. My proposal has 10 categories. I tried to reduce it further but realized that it wouldn't be much less configuration to maintain, so I stopped at 10.
Note that one of the ways that having few categories can work while still maintaining clear groups is that this PR updates many files to use quotes ("") instead of angle brackets (<>) for local cuDF headers that do not live in cudf/cpp/include
. With our "near to far" include ordering policy, these are arguably the nearest files, and using quotes allows us to have our first category simply check for quotes. These files will be grouped and sorted without blank lines, but in practice this does not lose clarity because typically headers from more than two directories are not included from the same file. The downside of this change is I don't yet know how to automatically enforce it. I hope that when developers accidentally use <> for internal includes that don't start with (e.g.) "cudf", they will be grouped one of the lowest priority categories, and perhaps this will induce them to switch to "" to get the headers listed at the top. The rule is simple: if it's in libcudf but not in cpp/include/cudf
, then use quotes. For everything else, use angle brackets.
Other than headers from RAPIDS repos, I group all other headers that have a file extension in a single group, and all files that have no file extension in another group. Since the latter also matches includes some files from libcu++, I have an explicit category for <cuda/
includes to keep them separate from STL includes. A frequent effect of the single "." group is that cub and thrust headers get grouped without a blank line between them. I don't think this is a problem.
Below I'm listing the (fairly simple, in my opinion) .clang-format settings for this PR. Note that categories 2-5 will require tweaking for different RAPIDS repos.
Some may ask why I ordered cudf_test
headers before cudf
headers. I tried both orders, and putting cudf_test
first generated significantly fewer changes in the PR, meaning that it's already the more common ordering (I suppose cudf_test
is closer to the files that include it, since they are libcudf tests).
I've opened a similar PR for RMM with only 5 groups. https://github.com/rapidsai/rmm/pull/1463
CC @davidwendt @vyasr @wence- @GregoryKimball for feedback
@isVoid contributed to this PR via pair programming.
IncludeBlocks: Regroup
IncludeCategories:
- Regex: '^"' # quoted includes
Priority: 1
- Regex: '^<(benchmarks|tests)/' # benchmark includes
Priority: 2
- Regex: '^<cudf_test/' # cuDF includes
Priority: 3
- Regex: '^<cudf/' # cuDF includes
Priority: 4
- Regex: '^<(nvtext|cudf_kafka)' # other libcudf includes
Priority: 5
- Regex: '^<(cugraph|cuml|cuspatial|raft|kvikio)' # Other RAPIDS includes
Priority: 6
- Regex: '^<rmm/' # RMM includes
Priority: 7
- Regex: '^<(thrust|cub|cuda)/' # CCCL includes
Priority: 8
- Regex: '^<(cooperative_groups|cuco|cuda.h|cuda_runtime|device_types|math_constants|nvtx3)' # CUDA includes
Priority: 8
- Regex: '^<.*\..*' # other system includes (e.g. with a '.')
Priority: 9
- Regex: '^<[^.]+' # STL includes (no '.')
Priority: 10
Checklist
- [X] I am familiar with the Contributing Guidelines.
- [X] New or existing tests cover these changes.
- [X] The documentation is up to date with these changes.
As mentioned on the other PR, I think this is great, thanks for doing it! I just had one bikeshed colour policy question:
- Regex: '^<cudf/' # cuDF includes
Priority: 4
Is it desirable to have a separation of detail API headers from public ones? (so '^<cudf/detail'
with a separate priority from '^<cudf/'
) This would work because matching is done top down, so we would just put the detail regex before the public API regex in the list.
As mentioned on the other PR, I think this is great, thanks for doing it! I just had one bikeshed colour policy question:
- Regex: '^<cudf/' # cuDF includes Priority: 4
Is it desirable to have a separation of detail API headers from public ones? (so
'^<cudf/detail'
with a separate priority from'^<cudf/'
) This would work because matching is done top down, so we would just put the detail regex before the public API regex in the list.
I don't think we need this extra group.
Moving out of draft
Other than headers from RAPIDS repos, I group all other headers that have a file extension in a single group, and all files that have no file extension in another group. Since the latter also matches includes some files from libcu++, I have an explicit category for <cuda/ includes to keep them separate from STL includes. A frequent effect of the single "." group is that cub and thrust headers get grouped without a blank line between them. I don't think this is a problem.
PR description to be updated. It seems all CCCL headers now belong to one header group.
PR description to be updated. It seems all CCCL headers now belong to one header group.
Updated, thanks.
/merge