cudf icon indicating copy to clipboard operation
cudf copied to clipboard

Automate include grouping order in .clang-format

Open harrism opened this issue 1 year ago • 3 comments

Replaces #14993

Description

This uses the IncludeCategories settings in .clang-format to attempt to enforce our documented #include order in libcudf. See https://docs.rapids.ai/api/libcudf/stable/developer_guide

I realize that there was a previous attempt at this by @bdice that met with some resistance. Reading it, I wouldn't say it was vetoed; rather, reviewers requested something much simpler. I have a few reasons to attempt this again.

  1. To make a separate task much easier. We are undertaking a refactoring of RMM that will replace rmm::mr::device_memory_resource* with rmm::device_async_resource-ref everywhere in RAPIDS (not just cuDF). This requires adding an include to MANY files. Getting the location of the include right everywhere is very difficult without automatic grouping of headers. I started out writing a bash script to do this before realizing clang-format has the necessary feature. And I realized that my script would never properly handle files like this.
  2. To increase velocity. Everywhere in RAPIDS that we have automated code standard/style/formatting/other, the benefits to velocity have outweighed the costs. To paraphrase @bdice, $auto \nearrow \rightarrow \mu \searrow \rightarrow v \nearrow$
  3. The previous PR #12760 had nearly 50 categories of headers. There was no way this could be applied universally across RAPIDS repos. My proposal has 10 categories. I tried to reduce it further but realized that it wouldn't be much less configuration to maintain, so I stopped at 10.

Note that one of the ways that having few categories can work while still maintaining clear groups is that this PR updates many files to use quotes ("") instead of angle brackets (<>) for local cuDF headers that do not live in cudf/cpp/include. With our "near to far" include ordering policy, these are arguably the nearest files, and using quotes allows us to have our first category simply check for quotes. These files will be grouped and sorted without blank lines, but in practice this does not lose clarity because typically headers from more than two directories are not included from the same file. The downside of this change is I don't yet know how to automatically enforce it. I hope that when developers accidentally use <> for internal includes that don't start with (e.g.) "cudf", they will be grouped one of the lowest priority categories, and perhaps this will induce them to switch to "" to get the headers listed at the top. The rule is simple: if it's in libcudf but not in cpp/include/cudf, then use quotes. For everything else, use angle brackets.

Other than headers from RAPIDS repos, I group all other headers that have a file extension in a single group, and all files that have no file extension in another group. Since the latter also matches includes some files from libcu++, I have an explicit category for <cuda/ includes to keep them separate from STL includes. A frequent effect of the single "." group is that cub and thrust headers get grouped without a blank line between them. I don't think this is a problem.

Below I'm listing the (fairly simple, in my opinion) .clang-format settings for this PR. Note that categories 2-5 will require tweaking for different RAPIDS repos.

Some may ask why I ordered cudf_test headers before cudf headers. I tried both orders, and putting cudf_test first generated significantly fewer changes in the PR, meaning that it's already the more common ordering (I suppose cudf_test is closer to the files that include it, since they are libcudf tests).

I've opened a similar PR for RMM with only 5 groups. https://github.com/rapidsai/rmm/pull/1463

CC @davidwendt @vyasr @wence- @GregoryKimball for feedback

@isVoid contributed to this PR via pair programming.

IncludeBlocks: Regroup
IncludeCategories:
  - Regex:           '^"' # quoted includes
    Priority:        1
  - Regex:           '^<(benchmarks|tests)/' # benchmark includes
    Priority:        2
  - Regex:           '^<cudf_test/' # cuDF includes
    Priority:        3
  - Regex:           '^<cudf/' # cuDF includes
    Priority:        4
  - Regex:           '^<(nvtext|cudf_kafka)' # other libcudf includes
    Priority:        5
  - Regex:           '^<(cugraph|cuml|cuspatial|raft|kvikio)' # Other RAPIDS includes
    Priority:        6
  - Regex:           '^<rmm/' # RMM includes
    Priority:        7
  - Regex:           '^<(thrust|cub|cuda)/' # CCCL includes
    Priority:        8
  - Regex:           '^<(cooperative_groups|cuco|cuda.h|cuda_runtime|device_types|math_constants|nvtx3)' # CUDA includes
    Priority:        8
  - Regex:           '^<.*\..*' # other system includes (e.g. with a '.')
    Priority:        9
  - Regex:           '^<[^.]+' # STL includes (no '.')
    Priority:        10

Checklist

  • [X] I am familiar with the Contributing Guidelines.
  • [X] New or existing tests cover these changes.
  • [X] The documentation is up to date with these changes.

harrism avatar Feb 15 '24 01:02 harrism

As mentioned on the other PR, I think this is great, thanks for doing it! I just had one bikeshed colour policy question:

  - Regex:           '^<cudf/' # cuDF includes
    Priority:        4

Is it desirable to have a separation of detail API headers from public ones? (so '^<cudf/detail' with a separate priority from '^<cudf/') This would work because matching is done top down, so we would just put the detail regex before the public API regex in the list.

wence- avatar Feb 15 '24 09:02 wence-

As mentioned on the other PR, I think this is great, thanks for doing it! I just had one bikeshed colour policy question:

  - Regex:           '^<cudf/' # cuDF includes
    Priority:        4

Is it desirable to have a separation of detail API headers from public ones? (so '^<cudf/detail' with a separate priority from '^<cudf/') This would work because matching is done top down, so we would just put the detail regex before the public API regex in the list.

I don't think we need this extra group.

davidwendt avatar Feb 15 '24 13:02 davidwendt

Moving out of draft

harrism avatar Feb 16 '24 06:02 harrism

Other than headers from RAPIDS repos, I group all other headers that have a file extension in a single group, and all files that have no file extension in another group. Since the latter also matches includes some files from libcu++, I have an explicit category for <cuda/ includes to keep them separate from STL includes. A frequent effect of the single "." group is that cub and thrust headers get grouped without a blank line between them. I don't think this is a problem.

PR description to be updated. It seems all CCCL headers now belong to one header group.

PointKernel avatar Feb 20 '24 19:02 PointKernel

PR description to be updated. It seems all CCCL headers now belong to one header group.

Updated, thanks.

harrism avatar Feb 20 '24 20:02 harrism

/merge

harrism avatar Feb 21 '24 19:02 harrism