ikvm icon indicating copy to clipboard operation
ikvm copied to clipboard

[JEP 261: Module System](https://openjdk.org/jeps/261)

Open wasabii opened this issue 9 months ago • 8 comments

This sub-issue is to track progress and conversation implementing JPMS in IKVM.

wasabii avatar Mar 15 '25 20:03 wasabii

There are two sides to this issue: the runtime support for modules, and the static compilation support for modules. On the runtime side, we're going to have to implement tracking of layers along with the modifications to class loaders. There is likely a bunch of the details of this which will be contained inside the OpenJDK code base itself, in Java code, just as there is a bunch of similar code for ClassLoaders taken care of by the OpenJDK code itself.

However, for static compilation, none of that will be available to us. The new ikvmc will need to deal with modules, their security, etc, on its own, without being able to load IKVM.Java.

wasabii avatar Mar 15 '25 20:03 wasabii

So, for IKVMC, I'm going to propose we work in at least two steps. Step 1 will be to implement the surface/loading interface of IKVMC ontop of a new C# version of the 'module' idea. The oranization of IKVMC is somewhat built around the concept of JAR files right now. IKVMC maintains a collection of JarFile instances, and associated JarFile.Item instances. Passing compilation source arguments to ikvmc (class files, resources, jars, etc) results in JARs being added to this collection. Naked class files and naked resource files (not packaged) are added implicitely to two internal JAR files: classes.jar and resources.jar. The code then proceeds normally as if they were in JAR files to begin with.

We have a new abstraction here: Modules. Modules have a concept of unnamed modules. That provides a convient place to put our naked top-level items.

ikvmc then will be the same sort of design: first it assemblies a collection of Modules, just as it previously assembled a collection of JarFiles. But one of these is 'unnamed'. Naked files go into the unnamed module. Later, when we do the rest of JDK9, JAR files and exploded JAR files with either automatic or manual names can go into their own Module instances.

The rest of the logic then proceeds forward by reading from these Module instances.

This new abstraction replaces the existing JarFile abstraction. And it is mostly the same concept. So it alone isn't too much of a departure from how it works now.

So, we should implement a new Module abstraction, in C#, in CoreLib. This abstraction needs to be capable of reading named modules, automatic modulres, and exposing and populating an unnamed module. The existing Importer code can reference this abstraction, but for now, just work against hte unnamed module as if it was the original JarFile abstraction.

Step 2 however will require a C# implementation of the rest: processing module relationships, etc, and a new definition as to the contents of a static assembly that contains Modules.

wasabii avatar Mar 15 '25 20:03 wasabii

IKVM generated assemblies are currently decorated with a JavaModuleAttribute, which expresses that their code originates from IKVMC, and enables the runtime support to load them 'specially', treating various other attributes and data in them as if they reflected the original .class file concepts. For instance, we have a PackagesAttribute which contains the package list for reflection.

These assemblies fit into a fictional ClassLoader hierarchy using AssemblyClassLoader. Their attributes help the AssemblyClassLoader figure out what that looks like.

A module is loaded by a single class loader, but a class loader may be responsible for an arbitrary number of modules.

The same could hold true of an Assembly. It could be the case that ikvmc is built such that it takes a set of modules and builds a single assembly from that set. The entire assembly is loaded by a single class loader. But it can contain multiple modules, each associated with that same classloader. Thus, the assembly would need some additional metadata, mapping package names to module descriptors. It already has package names. Is it possible we can just throw as additional assembly attributes into it, to carve those classes up into different modules? All the modules would be associated with the same AssemblyClassLoader.

Modules cannot cross assembly boundaries.

This folds nicely into the ikvmc redesign we mentioned above. ikvmc in this mode is to produce a single assembly, which is a combination of unnamed modules and automatic modules and normal modules. All get resolved, linked, and put into the same assembly, with the same assembly class loader. The set of which is fed into the runtime by the new abstraction we talked about above. The problem is how to distinguish automatic modules and the unnamed module on the command line:

ikvmc foo.class bar.jar module.jar module2/

foo.class is easy. That goes to the unnamed module. There rest though may have a module-info.java, or they may have a MANIFEST.MF with an Automatic-Module-Name. Or they may just be JAR files with no manifest. What are they?

If they have a module-info.java, I think we're safe saying they're a normal module.

Otherwise, what? Distinguishing between an automatic module and an unnamed module doesn't seem possible.

java.exe handles this explicitely: you either use --class-path or --module-path.

wasabii avatar Mar 15 '25 20:03 wasabii

Named modules can't read from unnamed modules.

Automatic modules can't read from unnamed modules.

So this seems to be pretty similar to strong naming in .NET: if you strong name something, it can only reference other things that are strong named.

Maybe then it makes sense to disallow an assembly from containing both named and unnamed modules. Which then has the benefit of simplifying ikvmc invocation: it's etiher building an unnamed module, or a named module, and that can be controlled by a single argument (-m) or something.

wasabii avatar Mar 15 '25 21:03 wasabii

Implementation of JPMS support in IKVM is going to require the introduction of a number of new concepts. JPMS provides a new layer that sits "almost" on-top of the existing ClassLoader hierarchy, but overrides some key parts of it.

A couple of fundamental new abstractions are introduced: Module and ModuleLayer. Module represents a uniquely identified grouping of Packages. A ModuleLayer represents a set of Modules. A ModuleLayer can be parented to zero or more other ModuleLayers. But usually at least one: the Boot Layer.

A few introduce challenges: A ModuleLayer upon creation is fixed to the set of packages. Packages cannot be added or removed after the creation of a ModuleLayer. Second, only one Module in a Layer and it's hierarchy can possess a single Package.

.NET has no restriction that limits the loading of Assemblies at runtime. This presents a problem for any attempt at mapping an Assembly to a Module: assemblies can come and go, Modules can't. And .NET has no restriction that restricts a Namespace to be unique among Assemblies: namespaces in .NET are not significant in any way. Two assemblies can have types in the same namespace. This presents a problem for mapping Namespaces to Packages: there will be overlaps between assemblies.

So while we could come up with some interesting virtual structures: for instance, a virtual Module could be created for each Assembly, this would conflict with other such virtual Modules for other assemblies. Or, for instance, we could pile all Assemblies into a single virtual Module: except then Packages would come and go on that virtual Module. Neither work.

Or, and this is a non-starter, every Assembly could have its namespaces mangled to create package names: cli.AssemblyName.Namespace. For instance, cli.System.System.String. This is terrible: cli.Microsoft.Extensions.Configuration.Abstractions.Microsoft.Extensions.Configuration.Abstractions.ClassName.

We could however take a much more restrictive stance: only certain Assemblies are mapped to virtual Modules. Every other assembly could end up visible through the unnamed module. However, in Java, named or automatic Modules (not unnamed) cannot dependent on unnamed modules. A properly modularized Assembly that depended upon a non-modularized Assembly could not have that relationship visible through the hierarchy. This hierarchy affects resolution of resources.

This approach is probably the best we can do: explicit opt-in. This means no existing .NET assembly would be visible as a Module and thus would not be accessible through modularized Java code. However, explicitly decorated Assemblies could be: for instance our own versions of java.base. Or other existing Java-modules transpiled to .NET assemblies. Essentially we would support compiling valid modularized Java code to .NET, but not support accessing existing .NET code from modularized Java code. I think this is unfortunate, but probably acceptable: Java modularization, in general, appears, to my eyes, to be a mostly dead effort. Most decent library authors try to make their code modularized for one or two users that request it: but in general no end users or developers really care. Nobody is truly using it for anything important.

That opt-in needs to be encoded into our generated assemblies. We currently use the JavaModuleAttribute to mark an assembly as a Java module (which at the time of creation, probably meant "module" as in .NET Module.) But we could extend this to describe multiple ModuleDescriptors and their associated Packages. Thus, a .NET assembly would be able to hold multiple Java modules. All of the java.se packages could be embedded into a single assembly.

The user might also want to take modularized Java code and generate a non-Module. Java allows the same JAR file to be added to the classpath or the modulepath. Which means it either appears in the unnamed module or a named module. We would do the same, extending ikvmc to consider input files either as classes or modules: ikvmc -m foo.jar -out foo.dll. Likewise, we would be able to embed multiple modularized JARs into a single assembly: ikvmc -m foo.jar bar.jar -out blah.dll. Or, leave -m off, to generate a non-modularized .NET assembly. For us this needs to be known at transpile time just as its known at runtime for Java.

Our java.exe version still needs to be able to reference .NET assemblies: -Xreference. Those references which are listed are automatically loaded, as assemblies, and thus available at runtime to be searched for modules. At runtime we do maintain a module-path, augmented with modules from those assemblies specified as references. This means items in -Xreference are either available on the classpath, or the modulepath, depending on whether it is a modularized assembly or not. However, unless --add-modules is specified, they are not actually added as root modules to the boot layer. They would be available for resolution only.

ikvmstub is an issue. Exporting arbitrary .NET assemblies as modular JARs would be nearly impossible: there are huge numbers of namespace overlaps. The problem though is we need these in order to compile OpenJDK itself: We override many OpenJDK classes by calling out to .NET classes. But OpenJDK itself is modular. We could replace all of this code with externs (JNI callouts that call into IKVM.Runtime). But that makes some logic a lot more difficult. Or, we could, develop a customized IKVM assembly that is modularized without being itself written in Java. This introduces easy potential for cycles, however: those could not access code in java.base itself.

For instance, we override java.lang.ref.Ref and related classes to make use of .NET features, such as WeakReference. WeakReference is in System.Runtime (or mscorlib). Neither of these are modularizable, since they massively overlap namespaces with other assemblies (and each other). So in a modularized copy java.base, how can we reference this? We really can't. This code would need to be replaced with externs or calls to a internal assembly-module.

wasabii avatar Mar 17 '25 17:03 wasabii

Changes to JVM startup process:
	System is still used, but the order is a bit different.
	Driven by threads.cpp:Threads::initialize_java_lang_classes
		Thread initialized. ThreadGroup set.
		Initializes the Module class.
		Unsafe constants.
		System.initPhase1 is called by JVM.
			sets a bunch of properties: encoding, home, line.separator
			creates stdin/out/err pipes
			VM.initializeOSEnvironment.
			VM.initLevel(1); // set for other classes to read
		Retrieve properties constructed by initPhase1
		Initializes exception types
	threads.cpp:create_vm:
		initializes TLS
		osstream
		launcher properties get processed
		init_system_properties
		initialize_jsr292_core_classes (prevents deadlock)
		System.initPhase2
			Creates the boot layer: ModuleBootstrap.boot()
				Attempts to find an ArchivedBootLayer. We can ignore this.
				boot2()
					finderFor("jdk.module.upgrade.path")
					finderFor("jdk.module.path")
					finds main module name
					assembles modules to add
					assembles modules to limit
					a few different paths to get to SystemModuleFinders.*
						mainModule specified
						allSystemModules
						ExplodedSystemModules SystemModuleFinders.ofSystem();
					consules the systemModuleFinder for JAVA_BASE, gets a ModuleReference
						this is probably the biggest change
						this class appears to actually examine the lib/modules file
							we won't have one
					fails if missing
					BootLoader.loadModule(base)
					Modules.defineModule(null, base, baseUri);
					enables JNI for base module
					needResolution should always be true for us (no archive), but maybe it won't on main module path?
					apply upgradeModulePath to overload systemModuleFinder
						this might not be possible since it would confuse static assemblies which don't use this system to resolve modules
					build roots. Add main module if specified.
					process ALL_DEFAULT, ALL_SYSTEM, ALL_MODULE_PATH
					applies limits
					on needsResolution:
						Modules.newBootLayerConfiguration(finder, roots, traceOutput);
					else
						systemModules.moduleReads, JLMA.newConfiguration.
					clf = ModuleLoaderMap.mappingFunction(cf); // this determines how module names map to class loaders
						this is mostly a static set of module names derived at build time (GenModuleLoaderMap.java).
					loadModules // registers the modules with their class loaders
						BootLoader.loadModule(mref); // except for java.base
					define boot layer ModuleLayer.empty().defineModules(cf,clf). This creates the final ModuleLayer instance.
			VM.initLevel(2)
		System.initPhase3
			Breaks if SM is enabled.
			VM.initLevel(3);
			ClassLoader.initSystemClassLoader(
			set the thread context class loader
			VM.initLevel(4);

So the main thing to understand here is that most of this code is governed by Java code. So Java code is running before the initialization of the boot layer itself. Code that is technically in jdk.internal, but not 'wrapped' into a boot layer or module. So the VM is initialized, clases can be created, and a ClassLoader exists: that ClassLoader is just unparented from any others, and not part of a ModuleLayer. The BootLayer is created to just document this fact, and provide mechanisms for LATER loading of the other classes in the same module. As long as the layer it assembles respects the truth of the matter (null classloader is reflected on eventually loaded modules which have classes that are in fact loaded by the null classloader) it is fine.

wasabii avatar Mar 17 '25 19:03 wasabii

@AliveDevil @cemerick

wasabii avatar Mar 17 '25 19:03 wasabii

Since IKVMC and runtime will both need C#-side tracking of modules, we'll need to implement a structure for that similar to RuntimeJavaType and RuntimeClassLoader. RuntimeModule. I am unsure whether we'll need to implement a RuntimeModuleLayer: since the boot layer itself is created in pure Java, I'm not sure it's considered a special case by any JVM-side logic. I think it only deals with Modules.

However, there is a distinction here: static compilation. We need a module graph during static compilation: we need to compile against a set of Modules. And for that we need to load a graph. And that graph will look an aweful lot like a ModuleLayer: loaded from a combination of JAR files, classes, and other modules. And since this is the compiler itself, we can't borrow the Java code. We have to implement it ourselves. So, up above, when I was talking about IKVMC, that's what I mean: we'll need something similar to ModuleFinder and ModuleLayer. And of course the ability to actually load that information from a file on disk: which is done at runtime using ClassLoader, but at compile time for us using RuntimeClassLoader.

The same structure will be needed for ikvmstub.

So, on that front, IKVM.CoreLib.Modules will end up looking a lot like jdk.internal.modules and java.lang.modules.

wasabii avatar Mar 17 '25 19:03 wasabii