carbon-lang
carbon-lang copied to clipboard
Suggestion: ditch packages, namespaces and libraries from syntax
Carbon introduces multiple concepts at the same time: libraries, packages and namespaces at the syntax level. I don't think all of these should be exposed as a part of the source code.
Let's overview what's the current state.
The smallest unit doesn't have a name - unit of compilation. This should be given a name. The documentation calls it a "file".
Files are grouped into a unit of distribution - a package.
A package can contain multiple libraries.
Libraries group "api" files and "impl" files so that there's a private namespace and a public interface of a library. So that a package exposes multiple libraries APIs.
In addition there are namespaces. Namespaces allow to breach through the library namespace bounds thus making it possible to have a common private implementation without exposing api to the package level.
Is it only me who thinks this is too complicated?
I propose the following:
- have a unit of compilation - a module
- allow an ability to expose public API from a module - symbols that can be reused
- allow an ability to import a module API from another module - maby reuse import/export C++ syntax
- allow module nesting with a directory structure, so that each directory can have a boundary of what it can export - maybe with index modules
- make each module addressable using a filepath
- introduce a distribution system that allows to mark exportable modules and version them
Why are syntax level packages/libraries/namespaces required? Is it so that it's easier map source files to binary objects?
If you're working on a small project, you can definitely just put everything in an api file. Nothing really requires using an impl file. However, impl files are useful because they allow for separate compilation (i.e., for a really big project, you can compile much faster and with more parallelism). See Collapse API and implementation file concepts for more discussion on the alternative.
Regarding namespaces, again they're not something you need to use. However, organizationally I think we can see from C++ that some developers find namespaces useful, and it's likely to help ease a migration.
I'm moving this to a question for leads in case there's something that I'm missing here, design-wise, although maybe converting this to a discussion may make sense.
However, impl files are useful because they allow for separate compilation (i.e., for a really big project, you can compile much faster and with more parallelism)
Such a mindset is bringing the worst of C++ into a new language. Leave alone the headers/implementation separation idea. The performance hit of touching header files is caused by a preprocessor. In C++ you're compiling the same code again and again. Compilation of N source files (when you don't require N! combination) is actually very quick.
Even with a thousand files project you can just extract public signatures of each module and compare them with a cached version and recompile dependants only if it's changed. That's how you prevent needless compilation, not by creating artificial stable/unstable API boundaries.
Let's analyse the claimed disadvantages listed in the collapse concepts proposals:
- Can't compile in parallel
- Developers might not be aware they are changing API.
- Java (and other languages) has interfaces, but Carbon doesn't.
- Compilation performance
- Read all the files to learn API
- Carbon build system will need to build a dependency graph, so every file will need to be compiled
- We want the compilation to be in parallel
- Developers are too lazy to read export signatures
- Name collisions
- Longer package names
- Eventually, there would be a need to split larger packages into smaller ones
- But we still want to deliver large packages, so that it's easier to version them
- We already defined a syntax for import/export
- We want to name distributed packages and actual symbols imported differently
- We want to import whole namespaces. We don't want to import every single symbol used
- If any symbol is possible to import from anywhere, that's a lot of symbols to keep in memory for IDE
- There will be more bytes required to name all the symbols
- We don't want special export/public/private keywords
- Every file must be parsed
- We want shorter symbol naming
Did I catch everything?
- You can compile in parallel once you know the dependency graph. Artificial separation into "impl" and "api" files only parallelizes the "impl" part, which may take less compilation time than compiling the "api" part.
- Developers can be warned by the IDE "you've changed the export signature of the module. This module has 100500 dependants and will take ages to compile. Are you sure?"
- Interfaces are not strictly for API / implementation separation. They are more like generic types. It's automatically generated documentation (JavaDoc) that extracts the API from Java implementation files.
- Compilation performance is not achieved by grouping the codebase into large chunks. Instead, it is achieved by making sure you only compile what is changed. Thus building the dependency tree, comparing the changes and compiling the smallest amount of code needed.
- Again, tooling. API documentation generators exist
- The Carbon import/export syntax should be simple enough for the AST parsing to be fast. You don't need to know the semantics to build a dependency graph, so it should be quick. It can be further optimized by keeping a dependency graph in memory and runtime module reloading
- You can do parallel optimizations when you know the dependency graph.
- API documentation exists
- Here, Carbon designers need to decide whether they want packages to be distributed in the source code or the binary form. If it's the binary form, long symbol names are inevitable. Alternatively, some dynamic symbol mapping should be invented.
- See 9
- Libraries/packages grow. Refactoring is a natural evolution process
- Versioning is a complex problem. Again, there's a difference between API and ABI, so maybe you'd prefer smaller packages/libraries depending on how you want to distribute them
- Maybe reconsider. C++ module export/import syntax is nice
- This is easily solvable for source code distribution but is hard for binary packages. What's the advantage?
- Importing whole namespaces creates a large dependency surface. I thought you were all for reducing the compilation time. Anyway, mass import syntax can be thought of
- I think a package is a good boundary to determine what could be imported/exported from it. Every package/library/folder can have an index/init/entrypoint file that determines symbols. So if you work on package A and have packages B and C as dependencies, you only need to keep public symbols from B and C and all symbols from package A. If that's still to much, consider splitting A into smaller packages.
- If you want to distribute ABI, name collisions are something to deal with. Maybe a runtime which dynamically generates unique symbol names?
- Why not?
- Every project has a few entry points, so only entry points should be parsed. And then imported modules from those entry points and so on.
- Maybe there's a way to encode namespaces inside the ABI? Some sort of symbol compression? But that would probably require a runtime.
However, impl files are useful because they allow for separate compilation (i.e., for a really big project, you can compile much faster and with more parallelism)
Such a mindset is bringing the worst of C++ into a new language.
I think this is not a productive way to engage in discussions about Carbon. Among other things, it reads as an absolute statement that doesn't acknowledge that other people may disagree, and it reads as dismissive of the point that Jon tried to make.
Leave alone the headers/implementation separation idea. The performance hit of touching header files is caused by a preprocessor. In C++ you're compiling the same code again and again. Compilation of N source files (when you don't require N! combination) is actually very quick.
Even with a thousand files project you can just extract public signatures of each module and compare them with a cached version and recompile dependants only if it's changed. That's how you prevent needless compilation, not by creating artificial stable/unstable API boundaries.
One challenge of this approach in my experience is dealing with the dependencies of the file that need to be present in order to extract the public signature. Often, the build system cannot definitively tell whether a dependency is necessary for that extraction, and so even extracting the interface will require all dependencies, even dependencies of just the implementation details, to finish building first.
While having separate API and implementation files directly exposes some extra parallelism, it also exposes the ability to have a more parallel build graph for large build systems.
Now, as Jon mentioned initially, not everyone needs this. Projects with 100s or 1000s of files may not. And it is fine to not use the features if they don't help. But we'd like the features to be available to help scale even further.
The current set of features was specifically designed based on the experience of several folks on the project scaling C++ builds, where we found exactly this separation important. As you say, we could do without it and work around any build scaling limitations. But so far the judgement call has been that the cost is reasonable and reasonably easy to avoid for users who don't need it.
There is also a completely separate reason that I at least appreciate separating API from implementation -- I find it to help me both organize my code and read the code of others. While I could in theory use tooling to extract this view, I prefer having the split directly reflecting in the source code itself, and being able to read the source code itself.
Let's analyse the claimed disadvantages listed in the collapse concepts proposals:
The list you give doesn't for me map to the arguments in the alternative that Jon linked to, so I'm afraid I don't follow this part of the discussion.
will require all dependencies, even dependencies of just the implementation details, to finish building first.
Let's analyze an example.
// main.carbon
import { PublicInterface } from './PublicInterface.carbon';
new PublicInteface();
// PublicInterface.carbon
import { ImplementationDetail } from './ImplementationDetail.carbon';
export class PublicInterface {
PublicInterface() {
new ImplementationDetail().doSomething();
}
}
// ImplementationDetail.carbon
export class ImplementationDetail {
void doSomething() {
// ...
}
}
So, your concern is that if we change the ImplementationDetal class, the signature of the PublicInterface class is not changed, there's no way to figure it out without analyzing ImplementationDetail? I disagree. AST parsing allows you to do a lazy evaluation of imports. You don't need to know what the imported symbol refers to for extracting the API of the PublicInteface.carbon file since it's not a part of a source code interface. Maybe you refer to a binary interface?
I like namespaces. They allow me to collect free functions with short names that I want to use unqualified into a sub-namespace that can be used where I need them without running out of names or having to resort to long names that affects legibility of the code.
They are also useful for extending concepts in a cross cutting fashion, moreover they are needed for extending overloaded functions/generics that are cross cutting. They are also useful for collecting things that should be edited together, but are associated with different parts of a program.
Are there other ways? Yes, you can introduce the concept of extension-slots in all aggregating concepts or other injection-mechanisms, but that is a radical departure from C++.
I like namespaces. They allow me to collect free functions with short names that I want to use unqualified into a sub-namespace that can be used where I need them without running out of names or having to resort to long names that affects legibility of the code.
As I understand it, at its current design, Carbon lacks the ability to import a specific function from a package/library. And aliasing them to a file-scoped name.
So you just have to import everything to a file and use namespaces to pick which functions are visible to the current file?
Essentially, you mass-import a subset of the functions from a particular package into a single file.
Wouldn't it be nicer to handpick which functions to import? Yes, it makes a long import list, similar to Java, but it prevents you from headaches of inventing new names due to mass-import.
Imports only inject the package name into the namespace. i.e., import Math only adds one name, Math. There is no "mass-import" similar to Java's import package.*, so I don't think the name conflict issues you raise exist. This is also discussed on the proposal I mentioned previously, #107, here.
Imports only inject the package name into the namespace. i.e., import Math only adds one name,
But the combination of "import Math and use namespace Trigonometry is effectively the same as import Math.Trigonometry.* in Java. Right?
Isn't that what you mean by "collect free functions with short names that I want to use unqualified into a sub-namespace that can be used"?
Imports only inject the package name into the namespace. i.e., import Math only adds one name,
But the combination of "
import Mathanduse namespace Trigonometryis effectively the same asimport Math.Trigonometry.*in Java. Right?
No, there is no use namespace.
Isn't that what you mean by "collect free functions with short names that I want to use unqualified into a sub-namespace that can be used"?
I expect @OlaFosheimGrostad is making an comment about code organization preferences; i.e., it's sometimes nice to put functions into a child namespace instead of having them hanging out in the same scope as a library's classes.
AST parsing allows you to do a lazy evaluation of imports. You don't need to know what the imported symbol refers to for extracting the API of the PublicInteface.carbon file since it's not a part of a source code interface. Maybe you refer to a binary interface?
This seems to assume both the build system using a language-specific tool to separate the interface from the implementation details (including analyzing and separating dependencies), and the build system restructuring the build graph based on that refined dependency information.
It is possible to design such a build system, and it is one reasonable design. However, it is a significant constraint on the build system design that comes with its own tradeoffs. For example, the dependency graph must be somewhat computed dynamically based on intermediate steps. Many build systems work to avoid this because an immutable dependency graph provides significant simplifications for the rest of their architecture. Others support it, but with less efficiency.
This direction would also require some mechanism to enable extracting the interface dependencies separate from implementation ones. That would either push significant complexity into the tooling or require language extensions to simplify the problem. Either way, Carbon would pick up some complexity.
At the end of the day, I would prefer that Carbon's design allow build systems to achieve the physical separation of dependencies without taking on the complexity of language-specific tools to extract separate dependencies. I think it is both a relatively simple way to separate them in the language itself, and it is sufficiently simple that it makes it trivial for the build system to reflect this trivially.
This seems to assume both the build system using a language-specific tool to separate the interface from the implementation details (including analyzing and separating dependencies), and the build system restructuring the build graph based on that refined dependency information.
Apparently, that's the path C++ is taking with its modules, isn't it?
That would either push significant complexity into the tooling or require language extensions to simplify the problem
The tooling is everything. For example, TypeScript is essentially a JavaScript tool, not a language.
I would prefer that Carbon's design allow build systems to achieve the physical separation of dependencies
So you'd prefer to specify dependencies between compilation units manually rather than to automatically infer them from import statements? It seems strange to me. What's the purpose of import statements if you still need to manually separate interfaces from implementation and think about the build performance? You can just use pre-compiled headers then.
it's sometimes nice to put functions into a child namespace instead of having them hanging out in the same scope as a library's classes.
You don't have such a problem if you don't import the whole package in the first place. If Carbon would allow single name imports with aliasing you wouldn't need namespaces. You would only need a hierarchical directory structure of package contents.
Let's put it that way: it seems like Carbon at its current design treats a compilation unit (file) as an independent entity. That is, a source file is still treated as a walled "object" file rather than a part of a larger codebase with a repository of other compilation units. I think all the modern languages nowadays treat code as an in-memory graph database rather than static machine code. C++ delegates the dependency problem to a linker while not giving it an access to the source code. That's why C++ modules are kind of dead on arrival.
Maybe Carbon could benefit having a bit more dynamic compilation stage, so to speak.
I hope my thoughts make sense. I'm not a language designer, but I share the frustration with C++ build process.
That's why C++ modules are kind of dead on arrival.
Non-specific and hyperbolic criticisms like this don't make for constructive discussion, because they tend to alienate rather than informing the people who don't already agree. Can you remove or rephrase it?
Trying to pull out what seems like the high level point here, as I don't think we're making much progress debating the details...
it seems like Carbon at its current design treats a compilation unit (file) as an independent entity.
We are explicitly trying to provide tools that allow a very high degree of separate compilation and dependency management, especially in distributed build systems.
But we are also trying to allow folks to ignore them and use a simple & easy model when that's all they need.
I think all the modern languages nowadays treat code as an in-memory graph database rather than static machine code.
We are looking carefully at what other languages do here, but again, we want to provide some powerful tools that at least some users of C++ have and benefit from so that when moving from C++ to Carbon those tools still exist. One set of those is around a reasonably high degree of separation for compilation & dependency management.
There are definitely other ways to design a language, but given our goals and priorities, so far Carbon is pursuing the direction that optionally exposes some of these "power tools" for physically separating things.
You don't have such a problem if you don't import the whole package in the first place. If Carbon would allow single name imports with aliasing you wouldn't need namespaces. You would only need a hierarchical directory structure of package contents.
What I meant was: In C++ it is sometimes convenient to have a namespace "mylib::operators" or something like that so that you can keep seldom used functionality in "mylib" and use them as "mylib::func", but still get to use frequently used stuff in "mylib::operators" unqualified so that your code becomes more readable. At least, that is my experience with C++ (C++ source code can become overly verbose and hard to read if everything is namespace qualified).
((C++ has something called an "inline namespace" that allows you to group a subset under "mylib::operators::func", but still access them as "mylib::func".))
Just wanted to say the leads did go back over this, and we're happy with the current Carbon design direction here.
I wrote up what I think is a reasonable summary for the reasons we're currently planning to stick with this direction above: https://github.com/carbon-language/carbon-lang/issues/2154#issuecomment-1242246708
We can revisit this in the future, but should do so with new information or data or specific problems that we need to solve.