fabric-loader [Draft/RFC] Ponderings on a post-Mixin world

When we kickstarted Fabric back in 2016, we decided to experiment with using Mixin as a library - however, certain crucial functionality could not be done the way we wanted to, and the project was shelved. In 2018, we decided to scale back our ambitions in order to get the project released - which has been a successful strategy. However, I think it's wise to at least consider using some of these past ideas and experiences for a post-Mixin world.

Now, please don't consider the following as a rant about Mixin. I highly respect Mumfrey's work and coding ethos - this is strictly a difference in design philosophies.

There are many features we wanted to expose for Fabric modders but could not. Mixin is designed to be a "mostly harmless" way to modify the game, primarily developed in the context of the Sponge project. Many of its approaches do not scale to a very differently designed project which Fabric is.

Many modders could benefit from more advanced injection/patching strategies. I would not like to get into detail here, as a lot of that is still in the "idea/to consider" stage.
Mixin's design and safety guarantees make it difficult to patch it in a way where certain mixins are pre-applied to the development environment. This, while not useful functionality for an abstraction layer, would be great to have for viewing code in an environment with large amounts of patches one directly interacts with.
There are a lot of non-standard kludges we use to make Mixin work in the Fabric context. This carries both a maintenance cost and a risk of divergence from upstream.
Already, we're observing features which are being added to fabric-loader outside of the Mixin library. It would be good to do something about that, as this division creates added fragility.

Because of community interest, I would like to propose an alternate approach, one I wanted to implement sometime (presumably this year?) but went on indefinite break before that happened.

Building blocks

In opposition to Mixin's strategy of treating annotated Java classes as the base format for patches, I'd like to propose a modular strategy, based on two key kinds of components:

Backend - a library which loads in patches in a specific intermediate format, and applies them to classes as they are loaded through it, producing transformed classes.
Compiler - a piece of software which can take in some form of input (annotated classes, domain-specific language, etc.) and turn them into patches in said specific intermediate format.

The key thing to note here is that there's a documented, intermediate format for patches. This allows many things which are not currently possible or would be difficult to implement:

Adding types of patches which are difficult to model in one specific format - we can add new functionality to the intermediate patch format which is not necessarily modelable (or yet modeled) in a compiler.
Allowing people to write low-level patches directly for advanced functionality, if they know what they're doing - the safety precautions can be moved to official/recommended Compilers, while the Backend's role is to simply emit warnings or failures, while allowing more advanced transformations to occur for those cases when the modder is aware of the effect of their actions.
Unifying all forms of patching in one format - right now, features not provided by Mixin have to be done elsewhere (AccessWidener), making things more confusing and fragile.
- This also may make remapping a mod easier - all you have to do is remap the classes of a mod, and the symbols used in the patch file.
The following are not functions of the split alone, but are easier to accomplish when the backend/patch-applying portion is separated from the compiler/parser:
- Cacheability - we can (non-trivially) create a hash of all the dependencies of the patches (configuration entries, mod list, patch data, etc.), allowing the transformed classes to be cached.
- Pre-application - we can apply patches from dependencies to a development environment, both speeding up testing cycles and allowing the modder to see the code as patched. This opens up lots of opportunities for easier-to-use APIs on Fabric, as well as makes things... less unintuitive to discover by newcomers.

Compiler

The most essential compiler to provide for an MVP is one which matches most of Mixin's existing functionality - that is to say, takes annotated Java classes as input.

Other compilers can be developed - one might have different taste in annotations, one might want to use a domain-specific language, or write their patches in Kotlin. The benefit is that they can be provided as independent Gradle plugins (or CLI tools) without necessiating special support in the Backend.

Backend

This is admittedly the least "developed" part of the concept. What follows is some rough notes from the top of my head:

Data-driven? For cacheability, it's wise to only allow state we can explicitly notarize (and invalidate the cache based on it).
A lot of mixin patches may be fundamentally modelable as "A -> B in scope C" regular expression-esque transformations. (This is where the term "asm-regex" which I used to refer to the idea internally comes from.) This, however, has to be verified.
We should not limit ourselves to existing functionality! There's a lot of mods and features which cannot be modeled well with what we have now. The Backend can implement features which Compilers don't know what to do with yet - or can only be used by a low-level modder writing patches by hand.
Some functionality ideas follow:
- Nested redirects - many redirects are of the form, for a given redirected call A, "if (X) { override } else { A(...); }". Such redirects could be nested, increasing mod compatibility.
- Enum extensions - sound dangerous at first, but warnings could be emitted - I hear there are still many places where this could come in handy.
- Redirecting calls - instead of injecting to the target method, one could wrap calls to a given target method. This is especially useful for patching interface calls - CharsetPatches used this to wrap inventory calls for lock protection.
- Wildcard matches - say, "all implementations of a given interface method".
- ASM class generation? Though this probably doesn't belong in Backend itself.
However, care has to be made to model the available transformations in such a way as to maximize the possibility of combinations; failing that, detection of conflicting patches is key!
Modularity can be considered - plugins which provide some kind of additional transformations, and aren't included by default.

Release cadence

As this would be one of the biggest breaking changes in Fabric's history, it would be wise to begin exposing it to modders early in a snapshot cycle, on a separate Loader branch. Ideally, the Backend would stabilize by said version's release - in this case, Mixin/AW can simply be removed on that branch altogether. In a less ideal case, the functionality can be backported to a version with both Mixin/AW (now deprecated) and the new backend, and hope things work out in the end.

Now, this short cadence might sound controversial - but, and this is entirely my personal opinion:

Snapshots are not meant to be user-facing or to have large modpacks - that is a side effect. Snapshots are principially for the Fabric development team and modders to experiment and break things; so if the solution is unstable for a part of the snapshot lifecycle, the benefits of mass modder testing and familiarization IMO outweigh the drawbacks of end users not being able to get a reliable snapshot experience for a period of time.
I would not begin this process until the backend and compiler were usable for at least the most common Mixin use cases. They can be tested separately, as a fork of fabric-loader, until that is the case.

I welcome thoughts, and discussion. I also offer mentorship and advice to whoever is willing to work on this, though I do not at this moment have time to work on this myself.

May 15 '20 23:05 asiekierka

Beware of the second system effect

May 15 '20 23:05 immibis

or write their patches in Kotlin

I do recall kotlin adds a bunch of syntactic sugar, how would we work around that for example?

it would be wise to begin exposing it to modders early in a snapshot cycle

The actual transition discussions should not occur until we have a somewhat stable and ready prototype. Leave that discussion to we reach that point and then we can bikeshed to hell

May 15 '20 23:05 i509VCB

@i509VCB This is entirely a draft; as I said, non-standard compilers are not something Fabric has to do themselves, and that's a good thing - Kotlin fans can just figure this part out themselves.

May 15 '20 23:05 asiekierka

Wouldn't all of these missing features changes be doable by just adding them to Mixin?

May 15 '20 23:05 immibis

@immibis this is such a different idea to mixin that it may as well be a different project.

May 15 '20 23:05 Earthcomputer

@immibis A few of them, yes; but they may not be easy to apply upstream, and this much divergence carries a significant maintenace cost. Most of them cannot be done, in my opinion, without a rethinking on the level of writing a new implementation - at which point it would be good to have one which caters to Fabric's specific needs instead of being hacked into a different project.

May 15 '20 23:05 asiekierka

Mixin's design and safety guarantees make it difficult to patch it in a way where certain mixins are pre-applied to the development environment. This, while not useful functionality for an abstraction layer, would be great to have for viewing code in an environment with large amounts of patches one directly interacts with.

As far as I know, there’s nothing preventing applying mixins statically. It just has to be done.

May 15 '20 23:05 natanfudge

@natanfudge We tried to do it in 2016 and kept running into Mixin's many safety checks and other weirdness. We spent weeks on it and it never worked quite right, especially as we had to apply some Mixins statically and some dynamically - that was the pain point.

May 15 '20 23:05 asiekierka

Would the compiled backend code be verbose (as in using English words/abbreviations) and writable by hand, or not at all human friendly (ASCII terms with no meaning like "M109" or straight up byte code)? I think this would have a large effect on how the backend/compilers would be structured.

May 16 '20 01:05 TheGlitch76

Something that would be really nice (but difficult) is a compiler that generates patches based on bytecode differences.

With something like this, patching a method would be as easy as making the changes in the decompiled vanilla code, recompiling the class, and calling the compiler on the original and modified classes.

May 16 '20 01:05 Runemoro

Been working on an alternative to Mixin for some time, but never had enough energy to put enough in to work on it. Figure it's worth mentioning here. Apparently too burned out after TickThreading to ever do anything else successfully :c

JavaTransformer: API for editing bytecode or source with the same code, instead of having to implement things twice

Mixin: Mixin implementation based on JavaTransformer. No reason it can't be statically applied, already is statically applied with a gradle plugin for dev environment

A lot of it's WIP or feature incomplete, and have never even got around to replacing the usage of TickThreading's old XML patcher in TickProfiler. :L

May 16 '20 08:05 LunNova

For @nallar: imo the tool will be more bytecode oriented. the source-code generation part can just generate some pseudocode, as unlike in other projects where source code is used to apply patches and generate binary patches, we exclusively use source code here as a form of documentation like javadoc. Your utilities like generics would be a bit overkill

May 16 '20 09:05 liach

The apply to sourcecode part is somewhat less useful here given there's no decompiled source workspace for the typical fabric setup, it was necessary for developing with forge which this initially targetted. Being able to parse the sourcecode using the same API is rather useful for the applying patches stage, as depending on the patch you're making a two-stage compilation can make things much more convenient.

Still, if you want your dev environment to have attached sources that match what the game will actually launch with, it's faster to apply patches to the source directly than to have to patch classes and decompile.

May 16 '20 09:05 LunNova

Imo the patches will be bytecode oriented (like tiny mapping format, using bytecode parameter index than source-code method parameter index). It just need to have a simple way to convert the bytecode patching operations to visible patched content (either working code or pseudocode) in source jar. Currently, sponge mixin loads its mixin classes through some convoluted process and makes it unfit to be applied to source code, even as pseudocode

May 16 '20 09:05 liach

The fundamental problem of doing anything with source code is that in Fabric, there isn't just one source, different decompilers can produce different sources. Moreover, decompilers can reorder blocks of code for nicer control flow, meaning that method invocation A may appear before invocation B in the source for one decompiler, and after in the bytecode and another decompiler. There are also other similar discrepanices between different decompilations: which loop the decompiler chooses, which local variables are inserted (fabric loader cannot assume there will be LVTs present in the target class).

May 16 '20 09:05 Earthcomputer

@Earthcomputer The patch generator wouldn't directly compare bytecode. It should first abstract both classes to a higher-lever representation of the code, where compiler choices such as instruction ordering and control flow generation don't matter (see Procyon IR, for example).

May 16 '20 09:05 Runemoro

You mean compiler and decompiler choices? It's a good idea but it still sounds infeasible. Let's focus on local variables as one example. The target class is in bytecode, and will have to be decompiled to this IR. The patch will start off in source code, and will have to be compiled to this IR. Now, since LVT information may have already been lost in the target class, you have no option but to rely on local variable indices and sort of guess where they start and end by their type. Now the compiler's choice of which local variables to merge etc is no longer arbitrary and will have to be taken very carefully, especially when this source could have been generated from different decompilers. I'm not sure how possible that is, and I'm sure there will be many corner cases where it won't work.

May 16 '20 10:05 Earthcomputer

No, there are no variable indices in the IR. The IR is basically a data flow graph, such that any method with the same behavior corresponds to exactly the same IR.

When the IR is generated for the vanilla method, a map between IR nodes and things like variable and instruction indices should be kept so that once the patches to the IR have been determined, they can be converted to patches to the bytecode.

This data is never used to compare the IRs, though, and doesn't even need to be generated for the patched method. It's just used to pull back the IR changes to Java bytecode.

May 16 '20 10:05 Runemoro

Let's not discuss hypothetical alternate compilers too much here, it will derail the subject. What concerns fabric-loader, from a technical standpoint, is two things:

the design of the backend and intermediate format,
the implementation of a compiler supporting existing Mixin/AW functionality.

May 16 '20 10:05 asiekierka

For the matching and transformation backend, I suggest first translating the bytecode into some IR like Runemoro suggested above, and then adopting an approach similar to Intellij's structural find and replace, except working on this IR rather than on Java source code. This suggestion should be understood keeping intellij's structural find and replace in mind.

If we adopt this approach, there are actually two types of script extensions we could support:

Smarter pattern matching. Given the set of substitution variables matched by the template impose an extra scriptable condition for the match to succeed, perhaps also testing another template elsewhere (e.g. is this method call calling a method containing a call to another method?). These scripts may also be able to add extra substitution variables based on these inputs.
Smarter substitution. Given the set of substitution variables from the match, generate the IR a given substitution should be replaced by. Each variable in the replacement template may reference a different script for this.

Scripts may also take parameters as inputs, to allow them to be reused more flexibly.

There would be several built-in scripts. A simple pattern matching script would control the number of IR nodes the substitution variable can match (analogous to intellij's "count" qualifier). The simplest substitution script would be to simply copy what is matched by a substitution variable.

May 16 '20 12:05 Earthcomputer

I think the fundamental hurdle here is that whatever replaces mixin needs to be just as easy as or easier than mixin. There's no point in getting rid of mixin if the replacement adds more development burden. I think it would be good to have Mixin stick around for one or two MC versions after the new system gets released, just because we don't wanna pull a Forge and break everyone's projects at the same time. Honestly, the holy grail would be a plugin that automatically compiled folks' existing mixin code as ASMR, with the promise that converting fully to ASMR lets them do difficult things infinitely easier. An embrace-extend-extinguish sort of thing.

Mixin's primary advantage is that patches are written in real Java code, just like it'll appear in the applied patch, so you don't need to know how bytecode works. That was always my main hangup with raw ASM and ASMjs; you needed to have a handle on a paradigm of programming completely different from what you're writing the rest of your mod in.

May 16 '20 20:05 LemmaEOF

So a quick summary here: We have backend (i.e. storage format that applies changes to bytecode/source) and compiler (convert dev-time format to backend)

Mixin:

compiler: java-annotated classes, annotation processor
backend: mixin class files and mixin config json. needs a whole parser to make these changes readable in dependency source jar

@nallar's JavaTransformer:

compiler: similar to asm, executed java code that spit out instructions (or a secondary compiler that parses java-annotated classes and convert to executed java code as in their "Mixin")
backend: class files that get executed and spit out instructions. the instructions can be read by any jvm so since gradle uses jvm they can use this backend

A few things I'd see in our version:

compiler:
- Need to be easy to write, like mixin class files (nallar's instructions are intuitive as well, I think)
backend:
- bytecode reliability first, source compatibility can be pheripheral (i.e. pseudocode/comments in generated source is fine, as long as people can see those code changes effectively)
- more flexible and generate less trash than mixin

imo @nallar and @Runemoro's plan on super-high abstraction may be somewhat unrealistic imo. We should focus on getting the bytecode right first (which we are already doing for our tiny mapping format), then back onto showing the changes in a human-readable way in sources (pseudocode acceptable)

May 16 '20 21:05 liach

So to summarize, things Mixin doesn't do that we'd like it to do:

Stacked redirects
Static application
Mixin default and static methods
Force conditional jumps to go a certain way
(possibly other transformations that nobody has ever asked for)

... anything else?

Jan 05 '21 15:01 immibis

Adding to enums
Injecting into constructors (in Fabric mixin already)

Jan 05 '21 17:01 Earthcomputer

Mixin is very targgeted in what it can change. Some ideas would benefit from something more dyamic such as:

Hooking all calls to a method, or all classes that implement/extend X class.

Jan 05 '21 17:01 modmuss50

It has been talked about more generalized targetting, like being able to replace item == Items.BLAH to tag.contains(item)

Jan 05 '21 17:01 dexman545

My 2 cents as a simple mod dev: It's essential that what can currently be achieved with mixin remain achievable by writing java code. Having to learn 2 different languages just to be able to mod using fabric seems unreasonable.

Jan 09 '21 05:01 supersaiyansubtlety

Such a system would have a mixin like dsl as a "front-end". You would use said front-end to make transformations of you wish. The actual built jar would use this system above.

Jan 09 '21 14:01 i509VCB

I still think AspectJ does most, if not all of this stuff.

Jan 10 '21 01:01 immibis

Looking at https://github.com/eclipse/org.aspectj I believe aspectj is outdated (stays on Java 9!) and poorly documented; it is probably even less suitable than mixin

Jan 10 '21 01:01 liach