Loosing information and possible discrepancy in the WIT specs
I'm not sure if what I'm describing here is a problem with the specs, or a problem with the implementation(s) so please let me know if I should report it somewhere else.
Recently I've discovered that when we convert a WIT package into the component-model binary representation (or just use it to build a component), we loose the name of the root package and world it was created from. Implementations like wit-parser are reconstructing this by introducing a fake root component (called root:component, with a world called root, in that library). This means that converting WIT to binary and back does not give back the original package, and prevents uses where for example a registry or runner could identify a WASM binary by extracting the root package's name.
While I was trying to understand if this is by design, I found the following example in the WIT.md specs:
I don't think this (and the other examples around it) is true - with all the tooling I've tried (although probably all using the same implementation under the hood) what happens is that the top-level component will have an exported function directly, with only the function's name as it's name. The information about the world and package is not present in the resulting WASM at all:
(export (;22;) "example1" (func 21))
When exporting interfaces the exported interface's name contains the package name:
(export (;13;) "golem:it/api" (instance 12))
but still it is the package of the exported inteface and not the "root package" of the component. (Note that both exports are directly under the top level component.
I think perhaps the explanation is that there is a difference between building a component targeting a world and packaging the WIT itself. In the latter case, what I would expect is shown in the part of WIT.md and should include the world name as an export of the package. However, for the former case, the world is intentionally lost because the idea is that worlds are structural / "duck typed": when determining whether a component is compatible with a host or another component, all that matters is the set of import/export names/types; the top-level world names are irrelevant and thus they aren't currently included in the component binary.
But perhaps there is a bug, so let me know if that makes sense or is not what you see in the binary WIT package.
Seems like it is working as you described; I did not realize that encoding the WIT produces a different structure as when building a component targeting a WIT world;
So I guess there is no bug and everything is as intended; but I still believe this might be something to consider to change. I understand that the original package name or world name is not necessary for composing the components or linking them to the host, but it feels weird that in a wit -> binary -> wit roundtrip we get a different result whether it is "just an encoded wit" or an actual component "implementing" that wit, given that the binary format itself is the same.
Even if not changing the actual encoding of components (with implementation) - do you think it would make sense to add some official metadata sections that contain this information and allows reproducing the original package? (Like the ones the wasm-metadata crates implements; I can't find the corresponding spec though)
To further explain why am I asking for this: I think the ability to restore the whole WIT package from a component is a nice property, and although there are many ways to solve this outside the component model - like storing the WIT files or a separate binary WIT together with the component, or embedding in a custom section, etc - all of these lead to duplicated information about the component's interface which can go out of sync. To be able to have a single source of truth would be nice.
I guess the reason for the difference is that a WIT package contains a component type (and possibly multiple of them, since a WIT package can define multiple worlds), whereas a regular component doesn't contain its own component type; it just has a list of import and export definitions (each with individual types). Moreover, when building regular components, we don't expect the roundtrip to be lossless since an optimizing compiler can always remove unused imports.
But adding a new subsection to documentation/metadata in a custom section seems worth discussing. I can't find the official place this is documented; I think perhaps it's implemented in the wasm-metadata crate in wasm-tools, but anyone feel free to link to any better references or places to discuss.