CppCoreGuidelines icon indicating copy to clipboard operation
CppCoreGuidelines copied to clipboard

SF.12: Rewrite to eliminate the notions of "project" and "component"; also copyedit

Open Quuxplusone opened this issue 5 years ago • 15 comments

The major change here is simply to delete parentheticals that muddy the waters by talking about files belonging to "projects" or "components". My experience has been that in industry we have "codebases," which are sometimes split up into "repositories" and/or "libraries" (which may be installed in e.g. /usr/local/include or may be used straight from their source tree), and everyone works on everything or at least knows someone who works on the other thing. Figuring out what's "in the same project" and what's "in a different project" is often fuzzy.

I would try to set up the build system so that its target_include_directories (or equivalent) could find every header via <>, and then use <> everywhere: even for files in the "same" component. (This does require that you follow SF.12's advice for "Library creators", though, which I admit I don't for just about all of my hobby projects.)

Strangely, the "Enforcement" section seems to have been written by a <> partisan like me, even though the rule itself is ""-endian. I copyedited the "Enforcement" section for grammar; did I unconsciously misinterpret what it was trying to say? What was it trying to say? Should it just be deleted?


Btw, I think there's an English style question here: what is a "header search path"? Is it the ordered set of directories that the compiler traverses when searching for headers (similar to $PATH on a POSIX system)? or is it one individual pathname, such as "/usr/include" or "/usr/local/include"? If the latter, then we need a different name for the ordered set of directories manipulated via g++ -I. In my copyedits, I assumed the former: there is one "header search path", searched by <>; and "" searches the same set plus ..

Quuxplusone avatar Aug 17 '20 15:08 Quuxplusone

I agree with this; there should not be mention of 'project' or other names for arbitrary structure we impose on our code, but note that @gdr-at-ms in #1596 wanted to see "structure" referenced in a way that would prevent the case where some_library/foo.h, installs as usr/include/foo.h and then the guidance is retro-actively interpreted as preferring foo.h to ""-include headers in usr/include.

Personally I don't think such an interpretation is reasonable and it seems obvious the guidance applies to the code where/as it is being authored and not where it is installed to later (the author applying the guidance cannot reason about the future location and relativity of where their code is installed).

what is a "header search path"

@Quuxplusone I thought you added this term in this PR? The original text is "the alternate search path" which is interpreted from 19.2.2:

"...searches a sequence of implementation-defined places for a header identified uniquely by the specified sequence between the < and > delimiters, and causes the replacement of that directive by the entire contents of the header. How the places are specified or the header identified is implementation-defined."

A more precise term then might be "implementation-defined places", reading instead as: "... or a header from an implementation-defined place"

the "Enforcement" section... What was it trying to say?

It enforces that headers under "" are relative, but only when those same headers are not also added to the <> search. So it is missing a case (e.g. the case you mentioned where a codebase may add all headers to the <> search) and it should probably instead read something like "a tool to check that files included via "" exist at a relative location to the including file, and those included via <> do not exist relatively to the including file".

emphasis on relative because the filesystem definition only may not be appropriate and it may need to take into consideration things like policy (e.g. a path like "../../foo/bar.h" is relative, but is ".." permitted?) and the implementation-defined places (e.g. <foo/bar.h> could be relative if it was installed under user code)

apenn-msft avatar Aug 19 '20 02:08 apenn-msft

I thought you added this term in this PR?

Well, yes :) but only as a replacement for the phrase "the alternate search path" which has the same question ("what does path mean in this context?") plus the additional question "alternative to, or alternating with, what?"

A more precise term then might be "implementation-defined places", reading instead as: "... or a header from an implementation-defined place"

That would at least be greppable back to the Standard's "a sequence of implementation-defined places," but I think anything as user-facing as the Core Guidelines should avoid phrases like "implementation-defined." What we mean is the thing manipulated with -I; we should look up what that's actually called and call it by name. GCC calls it the "search path" (or "search directory list"); Clang calls it the "include search path" (but also refers to a file being "found within a header search path" which sounds like the alternative definition of path I mentioned).

Quuxplusone avatar Aug 19 '20 13:08 Quuxplusone

I'm ok with standardese where the alternative means using terms partial to an implementation; we do already mention "implementation defined" in the guidelines, e.g.:

and be aware of constructs with implementation defined meaning (e.g., sizeof(int)).

We also mention the term "compiler" elsewhere; it's a more approachable term, but also less accurate. I think we can fairly expect any C++ reader to be familiar with what "implementation" means since it's critical to understanding common concepts like "implementation defined" and "undefined" behavior which are used quite often without explanation or euphemism)

To add a few more terms to the pile:

  • Apple calls it "header search path" and "user header search path"
  • MSVC refers to "Additional include directories" You've noted what GCC and clang call it.

If we do pick a term rooted in something quotable, I'd prefer it quote the standard than any specific compiler.

apenn-msft avatar Aug 19 '20 20:08 apenn-msft

@apenn-msft

[...] the author applying the guidance cannot reason about the future location and relativity of where their code is installed

In fact, reasoning about where the library/header is installed is not uncommon in the world outside of Windows platforms, and to some degree made very easy with build tools :-) We have to exercise some caution about the hypotheses we make in the formulation of the guidelines.

@Quuxplusone : I think include search path is a reasonable term to use in the Guidelines.

gdr-at-ms avatar Aug 19 '20 23:08 gdr-at-ms

@gdr-at-ms "include search path" is ambiguous; 19.2.2 and 19.2.3 both refer to a "search" being conducted for both <> and "" forms. I think the term could work if we qualify it as "angle-bracket include search path", which is wordier than "implementation-defined places" but probably more understandable (at the cost of being less 'grep'-able).

In terms of reasoning about install locations, I guess I can only provide my interpretation, but if it were not uncommon to interpret it as applying to the install location, is there a way to have the guidance distinguish the code as it exists in the install location versus the authored location, without introducing a term that refers to the code's structure (i.e. 'project' or 'library')?

apenn-msft avatar Aug 20 '20 00:08 apenn-msft

"include search path" is ambiguous; 19.2.2 and 19.2.3 both refer to a "search" being conducted for both <> and "" forms. I think the term could work if we qualify it as "angle-bracket include search path"

My mental model is that the "quoted include search path" is simply the "angle-bracket include search path" with . prepended. So, if <foo.h> is searched for in ../mylib/include, /usr/local/include and /usr/include, then "foo.h" is searched for in ., ../mylib/include, /usr/local/include and /usr/include. So there's still only one path; it's just used in slightly different ways for the two lookups. (In fact, if Clang fails to find <foo.h> in the angle-bracket include search path, Clang will look it up in . as well, and emit a warning if it is found in . recommending that you switch from <> to "".)

If my mental model is wrong, I'd be interested to know how to adjust it.

Quuxplusone avatar Aug 20 '20 12:08 Quuxplusone

It's partially correct, except:

there's still only one path;

as you noted, there are two paths:

for the two lookups

this is why the term "include search path" is ambiguous unless we know which form is being used. In the case of your earlier comment, you're looking for a term to represent:

the thing manipulated with -I

which would accrue into the search path used by the angle-bracket form (although I realize now, this (-I) is what I was referring to by the term "alternate", see below).

When used in the original sentence, things becomes clearer why it needs to be qualified:

It makes it easy to understand at a glance whether a header is being included from a local relative file versus a standard library header or a header from the _angle-bracket include search path _

if it were just "include search path", the distinction would be redundant since the header always comes from the include search path (depending on which form is used). I think I also realize now why I used the term "alternate" and that was a nod to:

an implementation may provide a mechanism for making arbitrary source files available to the < > search,

"alternate" was my attempt to name the "mechanism", which you also named "the thing manipulated with -I". I'm not sure what name we could give it that wouldn't raise questions, except for using verbose standardese like "the mechanism by which arbitrary source files are made available to the <> search", or just not naming it all. It means we lose the third distinction, but I think that is fine versus introducing a new term or cumbersome language. e.g. just the two cases is fine too: "relative file versus angle-bracket include search path" (also you can copy-edit remove any other instances of "local", it's already covered by "relative")

apenn-msft avatar Aug 20 '20 21:08 apenn-msft

@apenn-msft

@gdr-at-ms "include search path" is ambiguous; 19.2.2 and 19.2.3 both refer to a "search" being conducted for both <> and "" forms. I think the term could work if we qualify it as "angle-bracket include search path", which is wordier than "implementation-defined places" but probably more understandable (at the cost of being less 'grep'-able)

I am sorry, I don't see the ambiguity here. The term designates a set of paths consulted by two similar algorithms.

The Core Guidelines, deliberately, are not trying to emulate the a Bourbaki-level of precision so that they can communicate the right level of information to the majority of C++ programmers, giving enough leads to the experts to further specialized interests. So, by necessity it won't be able to satisfy every possible imaginable nuance.

gdr-at-ms avatar Aug 21 '20 19:08 gdr-at-ms

@apenn-msft: "...as you noted, there are two paths: for the two lookups this is why the term "include search path" is ambiguous unless we know which form is being used. In the case of your earlier comment, you're looking for a term to represent: the thing manipulated with -I which would accrue into the search path used by the angle-bracket form..."

Well, in Clang and GCC at least, AFAIK, there is only one search path; it's just used in slightly different ways for the two lookups. The thing manipulated with -I/-isystem is that single search path. Lookup for <foo> will look only in the directories in the header search path. Lookup for "foo" will look first in . and then in the directories in the header search path.

Does MSVC have separately manipulable search paths for <foo> versus "foo"?

(Clang and GCC also have a notion of "system headers" versus "[non-system] headers," but they use that notion only for suppressing diagnostics in system headers based on where the file is physically found; I've verified that it is 100% orthogonal to whether an inclusion was syntactically triggered via <stdio.h> or "stdio.h".)

Quuxplusone avatar Aug 21 '20 20:08 Quuxplusone

Not MSVC, but Apple does: https://stackoverflow.com/questions/3429031/header-search-paths-vs-user-header-search-paths-in-xcode

that's a concrete example although I was more referring to the abstract standard language, which mentions both "" and <> as performing a "search" (and I guess Apple did take that part of the standard to heart)

The part I mean about "include search path" being ambiguous is that if someone asks "is foo.h in the include search path?" the question would have to clarified as "it depends, are you including it using "foo.h" or <foo.h>?"

apenn-msft avatar Aug 22 '20 00:08 apenn-msft

Ah, I see that Xcode (and Clang via -iquote) support putting more things in the "user header search path" than just . (and in fact Clang has a dead codepath for taking . out of the "user header search path," too). Basically Clang has one path plus an AngledDirIdx into it, where AngledDirIdx says "if you're including via <>, start at this point in the path, instead of at index 0." (And if you're including via #include_next, start at an index that varies dynamically.)

// clang++ -iquote foo -I bar test.cc
#include "x.h"  // finds foo/x.h before bar/x.h
#include <x.h>  // finds only bar/x.h

Okay, that's close enough to "two search paths" that I'll stop arguing for the moment, at least. ;) I do think it will be good to stay away from the unadorned term "alternat[iv]e search path," though, since it's not at all obvious which of the two paths (the longer one used by "" or its shorter suffix used by <>) should be considered "alternative."

Quuxplusone avatar Aug 22 '20 15:08 Quuxplusone

Editors call: Thanks! This looks good, just waiting to see if there's a last reply from @apenn-msft and then this is ready to merge.

hsutter avatar Sep 03 '20 18:09 hsutter

To summarize the current state of the various comments: apenn-msft: I'm approved with comments; if we can slip in some of the copy-edits @Quuxplusone brought up, that would be great, here they are again:

  • "alternate search path" - I think we all agree it's not descriptive enough and we should refer to either the "angle-bracket search path" or the "quoted search path" (or <> and "" respectively)
  • re-reading the Enforcement section, I'm walking back my earlier comment and I think it is actually ok as-is: "A tool could identify headers referenced via "" that could have been referenced with <>." (we just need to enforce that if "" is used, it is relative; we don't need the converse since one could use <> even where a header is relative if they actually intended the header from the <>-search)

These aren't new to this PR so they don't have to be addressed here. If we can fit them in great, otherwise minor enough we can also do it on another PR.

@johnmcfarlane: wants to see an example that illustrates including a file from the same project but using the <>-form (to dispel any myth that "" must always be used from within the same project). I'm not sure this can be done without giving a formal treatment of the term "project" or whatever we decide to call the cohesive organization of a collection of code.

@gdr-at-ms: wants this notion (of a project / organization of code) to address the case where someone might interpret it to mean ""-include could be used if the including file will be relative at the location of installation (even though it is not relative at the location where the file is authored). For example, someone does #include "system.h" because they know their header will be installed in usr/include alongside file system.h. My take on this is that I don't think the guideline is likely to be interpreted in this way and it naturally has to apply to the code at the location it is being authored (and we don't need to give this location a formal treatment) because the guidelines are applied while authoring code. I think it would be nice to address any possible misinterpretation, but if it is an unlikely one, and addressing it comes at the price of the guidelines attempting to define what is going to be such a contested and liberally-defined concept like a "project" or "structure of code", I think it's not worth it (or at least it would be a large undertaking for another PR).

apenn-msft avatar Sep 03 '20 18:09 apenn-msft

These aren't new to this PR so they don't have to be addressed here. If we can fit them in great

IIUC, this sentence referred only to the two bullets in your list above? which were:

  • "alternate search path" - I think we all agree it's not descriptive enough — I wouldn't object to a PR that changes my text "a header from the header search path" to "a header from the search path used for angle-bracket headers". However, as I was making that commit, I realized that it's kind of tautological in context: "use the <> syntax as an indication to the reader that you want to use the <> search path"? I mean, duh. 🙂 Thus I'm not confident enough to just roll that addition into this PR: it might be seen as unnecessary tautology, thus slightly increasing controversy and delaying the merging of this otherwise-converging-to-consensus PR. So, "someone make a separate PR to add the words 'angle-bracket'" sounds good to me.

  • re-reading the Enforcement section — IIUC, this isn't a proposed change at all; you're saying that this PR's text is good as-is! So, good. 🙂

Quuxplusone avatar Sep 04 '20 16:09 Quuxplusone

kind of tautological in context:

here's how it is used:

from a locally relative file, versus a standard library header or a header from the header search path (e.g. a header from another library or from a common set of includes).

the example tries to draw a distinction between 3 types of include locations:

  1. locally relative (edit: just "relative")
  2. standard library
  3. another library, common includes

We know 1 is the ""-form, and 2 is the <>-form, but we are struggling to find a name for 3, which verbosely I would call "the angle-bracket search path configurable by the user". If we refer to it as "header search path" it's ambiguous whether this is the "" search or the <> search path (and this matters, because the guideline is attempting to espouse that headers in 3 be located from the <> search path)

Like coming up with a term for "project", this may be more trouble than it's worth; feel free to side-step having to name this thing by just enumerating the example:

from a relative file, versus a standard library header or a header from another library or from a common set of includes.

apenn-msft avatar Sep 05 '20 04:09 apenn-msft