Prevent Name Collisions Between Packages
In #86, it became clear that we'd like a solution to prevent name collisions of modules between packages. Starting this thread here to discuss solutions.
My preferred solution would just be that all modules in a library must start with the name of the library. Thus, you may have a module that is just the same name as the library. This is likely to be a common design; organize your library however you like, expose the public API via a module with the same name as the library.
I agree, I think this is the way to go.
Structuring the module and file names as we did so far helps emulate namespaces and subpackages, but I don't see a good reason to enforce it. The user can still do it if they prefer.
Here is the "minimal" proposal:
All module names must start with the library name, in particular the module name should be equal to:
- either the library name and underscore, such as
stdlib_*ortoml_* - or just the library name, such as
stdlibortoml
Do we all agree with this "minimal" proposal?
I do. If we all do, we have something to solve our immediate problem, which is name collisions between packages. This gives us time to discuss naming conventions, which I think is important to have, but that's a separate issue.
I agree, this is a good solution.
Sorry if I bring this back up - there are newer discussions (including on Fortran Discourse about module name collisions.
I've forked fpm and started a "module namespacing" branch to address this issue: https://github.com/perazz/fpm/tree/namespaced-modules
Things need to be discussed
- What coding style should I follow? I see fpm does not use many object-oriented facilities, most likely for compiler compatibility. Would having a
module_ttype, that extends the current simplestring_tused for modules, make sense in fpm, or that's too advanced? - My idea is to first provide a backward-compatible
module_tclass (already implemented), then extend it with more facilities for conflict resolution, but the actual rules that do actually resolve the conflicts remain undefined.
- What coding style should I follow? I see fpm does not use many object-oriented facilities, most likely for compiler compatibility. Would having a
module_ttype, that extends the current simplestring_tused for modules, make sense in fpm, or that's too advanced?
I'd say it's fine to use inheritance, but I'm not sure I understand why module_t would extend string_t.
- My idea is to first provide a backward-compatible
module_tclass (already implemented), then extend it with more facilities for conflict resolution, but the actual rules that do actually resolve the conflicts remain undefined.
What would it mean to do "conflict resolution"? If you just want to report it to the user, I'm pretty sure we already do that. If you want to somehow be able to compile the projects anyway, I'm not sure how you would do that since the compiler will (eventually) see the conflict no matter what you try to do differently at compile time.
but I'm not sure I understand why
module_twould extendstring_t.
The idea was that fpm could automatically add prefixes or other unique identifiers to the plain module name, that's why extending its string seems legit - also a good way to make it automatically backward compatible
would it mean to do "conflict resolution"?
I guess there's no consensus on what direction fpm should take yet, but I think that, with prefixed module names, fpm could at least generate one ghost version of each package that has all unique names, so the user could revert to using that if they want to avoid name conflicts. Think about how gfortran mangles routine names:
___myModule_MOD_myRoutine
that could extend to something like
___fpmPackage_FPM_myModule_MOD_myRoutine
in other words, each fpm module could be prefixed by
packageName_FPM_
or some other unique identifier
That's an interesting idea, but I think it may just move the ambiguity problem into fpm, not necessarily solve it. In order to achieve this, first, you'll need fpm to understand the source code much more than it currently does. Second, you'll need to make copies of the source code with the module names changed in both:
- The module definition, and
- The places the module is used
At that point, when you see a use statement, how will you know which package it comes from, and thus which "mangled" name to change it to? Especially if it is a module name that exists in multiple dependencies.
Yes to both. Ambiguity would be moved at the registry level: there could be collisions if one wanted to have a package that has the same name for both modules and the package name as a package in the main fpm registry; (or one could go one level up further and have the unique module name as registry+package+module but at this point my head is spinning).
If we restrict the renaming to modules only, a full language parser is not needed. It could be prescribed that for a package to be in the official fpm registry, it should only use module names from the "unique" representation. That could be checked/enforced really easily