chasm
chasm copied to clipboard
New Transformer Format
As mentioned in #89, the current format for transformers/transformations has some problems regarding long lasting locks and the effects of transformer ordering. After thinking about these issues for a while, I am now ready to propose a radically different approach to before.
How did we get here?
When Chasm was originally developed (and still called ASMR), a "transformer" was approximately defined as a class that
- declares dependencies
- declares "reads"
- declares "writes"
Based on their dependencies, transformers would then be sorted into rounds. Within each round, the reads and writes of transformers would be sorted to avoid as many conflicts as possible.
While lots of Chasm changed since then, this base structure was kept, although made more declarative in the process. A transformer would now
- declare dependencies
- generate transformations (each consisting of reads and a single write)
The behavior is essentially the same as before: Based on their dependencies, transformers are sorted into rounds, and within a round transformations are sorted to avoid conflicts.
One thing that always bothered me was that dependencies are treated before worrying about conflicts. When sorting transformers into rounds, we could simply create one round per transformer and this way avoid all explicit conflicts. Of course this would result in a very high number of silent undetected conflicts and crashes. Therefore, our goal was to sort transformers into as few rounds as possible, to get as much out of our conflict resolution as possible.
Another problem was discovered recently: If a transformer want to see the changes made by a mod, it declares a round dependency on said mod. This however, prevents it from seeing the initial state of the classes. As a workaround, it was suggested to provide access to the initial classes for verification purposes. However, this is a hardcoded special case, and it feels like the wrong approach for Chasm.
A new proposal
Transformations no longer exist
To avoid the problems of having multiple levels of ordering, all ordering is reduced to a single level. Technically transformations still exist, but they are now executed as a group as part of the transformer and not relevant to collision handling.
Transformers specify targets
Since transformations are gone but targets are still needed, targets are now defined on transformer level. "Target" refers to any specific node or slice in the class tree.
Locking is now done on target level
If a target is defined as locking, it will behave as a lock from the moment it was created until the transformer was applied.
Transformer application is atomic
Since there is no sub-ordering of transformers, transformers are now atomic. That's a good thing! This simplifies many things like renaming classes or adding locals. If you need something to not be atomic, then just put it in separate transformers.
Dependencies are specified at target level
This one might feel a bit odd, but the reasoning is simple: In order to declare a target, the target must exist. Therefore, the targets care the most about being applied before or after other transformers.
Some implementation details
In general, the process is now as follows:
- Transformers are sorted according to the ordering implied by their targets (if possible)
- Targets are read as soon as all their dependencies were applied
Transformer Format
A transformer has two members:
-
targets
-
apply
target
is a map of target_id
to target
.
Each target
contains the following members:
-
after
: a list of transformer ids that need to apply before this target is read -
before
: a list of transformer ids that need to apply after this target is read -
type
: Either "read" or "write" -
lock
: The locking behavior of the target -
target
: A function taking in the classes at target read time and returning a slice or node target
lock
defines the locking behavior of the target. I'm not sure about the details yet, but it will be something along these lines:
-
none
: The target may be modified, as long as it can still be found -
shallow
: The target may not be modified, but its child nodes may -
full
: The target may not be modified at all
apply
in the transformer will be run as soon as possible after all locks were read.
The argument to this function will be a map of target_id
to the results of reading the respective target.
The return value is a map of target_id
to a replacement for the specified target, for all targets with type "write".
Transformer context
When creating a transformer, the following values are provided as "globals":
-
id
: contains the id of this transformer -
transformers
: contains the ids of all registered transformers
Example transformer
A very simple transformer is shown here. It makes the class com.example.ExampleClass
public.
{
targets: {
a: {
after: [
"transformer_a"
],
before: [
"transformer_b"
],
type: "write",
lock: "none",
target: classes -> {
node: classes[c -> c.name = "com/example/ExampleClass"][0].access
}
}
},
apply: sources -> {
a: sources.a | 0x0001
}
}
I've done some more thinking, here are some updates:
Comparison to previous design
Target application is analogous to the originally defined "read phase", or "reads". Transformer application is analogous to a "write phase", or "writes".
Types of dependencies
Target application is read-only. Transformer application is a write (with an implied read). We can therefore have three kinds of dependencies:
- Target after transformer (read-after-write)
- Transformer after target (write-after-read)
- Transformer after transformer (write-after-write)
Accordingly, we need three mechanisms of defining explicit dependencies:
-
transformer.targets.<id>.after
: Read target<id>
after writing specified transformers (read-after-write) -
transformer.targets.<id>.before
: Write the specified transformers after reading target<id>
(write-after-read) -
transformer.after
: Writetransformer
after writing sepcified transformers (write-after-write)
Additionally, we probably want the ability to define dependencies for other transformers and their targets from within other transformers. How we would cleanly do this?
I think the most obvious approach would be adding two functionalities:
-
transformer.before
acts like the inverse oftransformer.after
- Allow
transformer.after
andtransformer.before
to specify both transformers and targets
Read and write phases
For the sake of exmplanation, I'm bringing back read and write phases. A processing step consists of a read phase, followed by a write phase. The chasm processor simply performs processing step until there is nothing left to do (success) or no progress is being made (failure).
Read phase
During a read phase, the processor applies all targets that can be applied. A target can be applied if alll the transformers it depends on where already applied. Since targtets can't depend on other targets, they may be applied in any order.
Write phase
During a write phase, a single transformer is applied. A transformer may only be applied when all of the following are true:
- All transformers this transformer depends on have been applied
- All targets this transformer depends on (including its own) have been applied
- None of the write targets of this transformer overlap with a lock (except its own locks)
- None of the write targets of this transformer would invalidate targets that haven't been applied yet
This may result in multiple transformers being applicable. If this happens, transformers are prioritized according to what was previously called "soft dependencies":
- Prefer "inner" writes before "outer" reads
If there are still multiple transformers available, application order is undefined, but consistent.
After some internal discussions, this proposed change has been approved.
Some additional details regarding the changes:
- The function
targets.<id>.target
should be changed to that the return vaue is a path, not a node. This should clean up the type system, although we might need to add more utility to make contructing these paths simpler. - Due to the proposed changes, the concept of explicit rounds is gone. However, there is still a possibility for implicit rounds, which will be defined via convention and dependencies. This is something that will possibly be necessary for mass ASM.
I've done some more thinking, and I'm running into some walls that were build when originally designing this project. Here's my dilemma:
Chasm was designed to sort transformers. Those transformers would statically declare what they intend to do, and Chasm would try to sort them in a way that they can all be applied.
After some designing, a problem was discovered with this approach: A transformer cannot target code generated by another transformer. This is because the target doesn't exist initially.
As a workaround, rounds were introduced. This allows a transformer to specify it's target after certain other transformers were applied. This issue suggests a different solution, but it eventually runs into the same problem: We cannot handle collisions we don't know about. By allowing transformers to specify their targets later, we fundamentally break the core Chasm philosophy of sorting all transformers.
I see two solutions to this:
- Break the set of sortable transformers into subsets, causing the issues above, and accepting them as a limitation.
- Reduce the freedom transformers have in computing targets.
I think the first solution has been explored sufficiently to know that it works, but isn't ideal. The second solution needs some explanation.
As mentioned earlier, the problem is that transformers can specify their targets delayed. However, this isn't strictly speaking true. By a different interpretation, transformers always specify how to locate their target. This specification itself could be interpreted as the target. This version of the target is always available initially (in form of Chassembly), the only problem is that they aren't sortable.
In order to sort transformers, we need to be able to identify how targets overlap. Without providing an exact prove, I am fairly certain that this is not possible for our current format. As a simplification, if we limit target specification to only be a static path, we can definitely detect target overlaps. As a generalization, a target specification may be allowed Turing completeness, which I'm confident is not sortable.
Now my ultimate question here is, if we can figure out a target specification format that clearly allows Chasm to identify target overlap, while still allowing transformations as arbitrary as possible.
Edit: An additional note: We know for a fact that we can find an order for Turing complete target specifications if it exists. Worst case, we can just try all orderings. However, ideally we want it to be efficient. It is possible that trying orderings is good enough, if we can filter out orderings that we know have identical effects, but whether that is feasible is another complex topic.