babylon Experiment: Alternative SSA construction algorithm

Hi,

this is mainly an experiment, I wanted to play around with the API and see how an alternative SSA construction algorithm can be implemented. There are a few notable problems I encountered:

The original algorithm is supposed to work on-the-fly in a single step. This does not work with the current API.
- Values (operands) and References (successor arguments) are bound eagerly. I assume terminating ops could be emitted only after everything else was done, but that's not easy to achieve currently either. For operands, I don't see any way to say "oh, please use a different value" after the operation is passed to the builder. I'm not sure how relevant this is for other transformations, but if we find a simple solution to this without giving up immutability, I could imagine it to be very powerful.
- I'm currently using more data structures to cope with the two-step variant than the original algorithm itself requires.
  - The predecessors map could be avoided at the cost of additional computation whenever only filled blocks should be considered
  - The loads map is required as we can't replace the loads directly during the analysis step
  - The additionalParameters could probably be removed if Block.Builder allowed removing parameters
  - The deletedPhis set is also needed to cope with the fact that we look up current defs in both steps
I noticed that for the method in TestTraverse, this implementation gets rid of one parameter that the existing implementation does not get rid of. I'm not sure if it is designed to produce minimal SSA, but that looks odd. (c is passed to the while body, but it is never read there before being written)

I think the algorithm itself is pretty straightforward, but the extra steps needed to adapt to the API somewhat nullify that. I tried to document the implementation details that differ from the paper, but if there are any questions or suggestions how this could be improved, please let me know.

Progress

[x] Change must not contain extraneous whitespace

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/babylon.git pull/214/head:pull/214
$ git checkout pull/214

Update a local copy of the PR:
$ git checkout pull/214
$ git pull https://git.openjdk.org/babylon.git pull/214/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 214

View PR using the GUI difftool:
$ git pr show -t 214

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/babylon/pull/214.diff

Webrev

Link to Webrev Comment

Aug 14 '24 07:08 SirYwell

:wave: Welcome back hgreule! A progress list of the required criteria for merging this PR into code-reflection will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

Aug 14 '24 07:08 bridgekeeper[bot]

@SirYwell This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

Experiment: Alternative SSA construction algorithm

Reviewed-by: psandoz

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 3 new commits pushed to the code-reflection branch:

f69e732433c903febc2ef647c5459139130b9acc: Dont use hashbang for launcher scripts
cb52322017ca649c25c40b27dd9478419521f8f3: Fixed env.bash issue found on Linux (Mac ok?)
94b5149779bed3715885955a9d9677f9e9d97d21: bld script and bldr tools cleanup

Please see this link for an up-to-date comparison between the source branch of this pull request and the code-reflection branch. As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@PaulSandoz) but any other Committer may sponsor as well.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

Aug 14 '24 07:08 openjdk[bot]

Webrevs

Aug 14 '24 08:08 mlbridge[bot]

This is a good experiment. The Cytron paper is quite old and I did not check to see if there were more recent alternative algorithms that were simpler. It will take me some time to fully digest the paper and your implementation. My gut feeling is this might be a good default replacement for the current implementation. I am curious as to how easily you found the adaption from phi nodes to block parameters. Generally i have found it easy to adapt and it can often simplify things.

You noticed a deficiency in the Cytron implementation (and possibly the algorithm), regarding a variable that is just stored to. We have noticed that too.

I don't currently see a way to retain immutability and post-adapt what is currently built before building completes. Maybe as I look more deeply a few ideas will emerge. One approach i have used when transforming is instead of building up data structures I build up (capturing) functions on the first pass to apply when transforming on the second pass. I don't yet know if this would make your implementation simpler.

Aug 15 '24 22:08 PaulSandoz

I am curious as to how easily you found the adaption from phi nodes to block parameters. Generally i have found it easy to adapt and it can often simplify things.

Conceptually, I think block parameters are easy to understand. One hurdle I could imagine when doing global transformation is the "when do I need them", or rather the "why do I sometimes need them". I'm not sure how I would explain that without explaining phis first. But for local transformations, that doesn't really matter at all, reducing the complexity in that aspect.

It also fits in well with the design of the API (I think it wouldn't fit into a sea-of-nodes style API), and transformations between the representations seem pretty simple too.

I don't currently see a way to retain immutability and post-adapt what is currently built before building completes. Maybe as I look more deeply a few ideas will emerge. One approach i have used when transforming is instead of building up data structures I build up (capturing) functions on the first pass to apply when transforming on the second pass. I don't yet know if this would make your implementation simpler.

I'm not sure how useful that is in general, but I think terminating ops could always only be added to the block after everything else was done. Combined with the ability to remove block parameters again, it would potentially made some things a little easier. However, that also feels like an odd special casing that might be more confusing than helpful.

Aug 16 '24 07:08 SirYwell

(BTW the changes from Set to SequencedSet are nice, feel free to separate that out as a separate PR.)

(Still working my way through the paper + code, i'l send some further thoughts later today.)

Explaining phis at that level is i would imagine a proxy for explaining control flow join points. If one thinks of blocks as little functions, that jump to other functions as a tail call, it might actually be easier to explain.

Combined with the ability to remove block parameters again, it would potentially made some things a little easier.

This should be possible to support, the Block.Builder::parameters currently returns an unmodifiable list. I was hesitant about exposing a mutable list, since removing a parameter that is used will create an invalid model. But, we could support removing block parameters that are not currently used nor are any successor arguments associated with them. Although, that probably means it is not very useful.

In general my approach to the design as been to lean into multiple passes (analysis/transformation) rather than try and optimize for single passes. That's a trade off. We are not trying to build a framework for compiler pipelines like say LLVM, MLIR, or C2, although there are obvious similarities to key aspects.

Aug 16 '24 16:08 PaulSandoz

I have a better grasp now (not complete). This algorithm really leans on the mutability of a phi node's operand list and user list, and replacement of uses of a phi with another value. I don't see how we can make it work in one pass with the current code model design. However, its still rather good in two passes, better than the Cytron implementation IMO.

My suggestion would be to lean into the two passes and make the second pass simpler for the resolution of values, for load operation results and additional successor arguments. Its either replacement with another Value or a say BlockParameterHolder that holds the actual Block.Parameter to use. When a Block is processed to append the additional block parameters given the list of the holders it will update the holders.

In our current approach it is possible to SSA transform models with nested operations, but the constraint is variables declared outside a body can only be loaded from within a body. This obviously works well for lambdas. But it means we cannot support say the SSA transformation of the high-level model while retaining the structure e.g. the model for this code snippet:

int x =0;
if (...) {
  x = 42;
} else {
  x = -42;
}

We would have to transform the operation modeling the if statement into one that yields the resulting value of x. I explored that and it quickly became complex. So instead we throw an exception.

There is also another case, not currently supported, where we cannot fully SSA transform - if a variable is updated in a try block and accessed in a catch or finally block, or updated in a catch block and accessed in a finally block. We should either fail on such cases or leave the variable intact (perhaps via some configuration parameter). This requires analysis of stores within exception regions. We might be able to transform and split the code, but that is harder and may not be worth the effort given the likely edge case nature.

Aug 16 '24 23:08 PaulSandoz

This should be possible to support, the Block.Builder::parameters currently returns an unmodifiable list. I was hesitant about exposing a mutable list, since removing a parameter that is used will create an invalid model. But, we could support removing block parameters that are not currently used nor are any successor arguments associated with them. Although, that probably means it is not very useful.

Yes, that's a restriction that is probably also hard to track, and not restricting removal wouldn't be a good API I think.

This algorithm really leans on the mutability of a phi node's operand list and user list, and replacement of uses of a phi with another value. I don't see how we can make it work in one pass with the current code model design. However, its still rather good in two passes, better than the Cytron implementation IMO.

I agree. I also think if we want to properly support try-catch-finally constructs with it (as you mentioned), it wouldn't be one pass even if we had mutability. So two passes are probably fine.

My suggestion would be to lean into the two passes and make the second pass simpler for the resolution of values, for load operation results and additional successor arguments. Its either replacement with another Value or a say BlockParameterHolder that holds the actual Block.Parameter to use. When a Block is processed to append the additional block parameters given the list of the holders it will update the holders.

I'll try to improve on that.

In our current approach it is possible to SSA transform models with nested operations, but the constraint is variables declared outside a body can only be loaded from within a body. This obviously works well for lambdas. But it means we cannot support say the SSA transformation of the high-level model while retaining the structure e.g. the model for this code snippet

I think this is a general problem that comes with the (very powerful) flexibility of the model. It's generally difficult to set boundaries what a valid input code model is for a specific transformation operation, and then there might still be two options: 1) In such case, do not touch that variable and only transform the rest, or 2) throw an exception if an unsupported code element is encountered. Both might make sense in specific scenarios, or both variants could be combined depending what kind of unsupported elements are encountered.

Without trying hard, I also noticed that the current transform API isn't well-suited when trying to clean up e.g. empty blocks or to combine blocks. Especially with the SSA transform, we end up with unneeded blocks in many places, so I thought that could be a potential improvement, but I didn't find a way to do that without a lot of additional work.

Aug 19 '24 14:08 SirYwell

I think this is a general problem that comes with the (very powerful) flexibility of the model. It's generally difficult to set boundaries what a valid input code model is for a specific transformation operation, and then there might still be two options: 1) In such case, do not touch that variable and only transform the rest, or 2) throw an exception if an unsupported code element is encountered. Both might make sense in specific scenarios, or both variants could be combined depending what kind of unsupported elements are encountered.

Yes, since it is specific to the operations present in the model. I used the current SSA implementation carefully in the Triton example, when translating the operation modeling the Java for statement, modeling a counted loop, into a Triton SSA counted loop expression.

Without trying hard, I also noticed that the current transform API isn't well-suited when trying to clean up e.g. empty blocks or to combine blocks. Especially with the SSA transform, we end up with unneeded blocks in many places, so I thought that could be a potential improvement, but I didn't find a way to do that without a lot of additional work.

Indeed, it's often easier not to do that, and apply as a separate transformation if needed. When a body is built, any empty blocks with no predecessors are removed, and then the blocks are topologically sorted. This will retain non-empty block with no predecessors, occurring at the end, and i am unsure whether it is appropriate to remove them or not -- is it intentional or an error? It was intentional, for simplicity, to retain blocks that unconditionally branch to other blocks that could otherwise be merged, lowering will likely do this.

Aug 19 '24 17:08 PaulSandoz

@SirYwell This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

Sep 16 '24 23:09 bridgekeeper[bot]

I experimented with moving around a few things, mainly the code in the public void apply(Block.Builder block, Block b) method (dealing with new block parameters) and the load situation in the second pass. Somehow I always ended up confusing myself what's mapped and what isn't. I hadn't the impression things got clearer (but maybe I went into a wrong direction). Maybe you have more concrete ideas that I missed?

I also noticed that CopyContextImpl works recursively (in the sense of looking at parent contexts) for values but not for blocks and successors. Is that intended?

Oct 13 '24 12:10 SirYwell

I experimented with moving around a few things, mainly the code in the public void apply(Block.Builder block, Block b) method (dealing with new block parameters) and the load situation in the second pass. Somehow I always ended up confusing myself what's mapped and what isn't. I hadn't the impression things got clearer (but maybe I went into a wrong direction). Maybe you have more concrete ideas that I missed?

Not at the moment. How about we do the following? Switch to use the new implementation, ensuring tests pass. Keep the old one too and ensure it is explicitly tested, since it is a good test case. And incrementally, if needed, update the new implementation.

I also noticed that CopyContextImpl works recursively (in the sense of looking at parent contexts) for values but not for blocks and successors. Is that intended?

Yes, blocks can only reference other blocks from within the same body. Do you need access to a parent context?

Oct 14 '24 17:10 PaulSandoz

Not at the moment. How about we do the following? Switch to use the new implementation, ensuring tests pass. Keep the old one too and ensure it is explicitly tested, since it is a good test case. And incrementally, if needed, update the new implementation.

Okay, I'll do that. Is there are way in jtreg to run tests with different system properties? This way we could just check for the property in one of the transform methods (in one of the classes) and delegate to the other depending on what was passed. This is how I tested it locally, but running tests for both this way (where they don't have different results that are relevant to the test) might make sense. WDYT?

Yes, blocks can only reference other blocks from within the same body. Do you need access to a parent context?

I mainly was confused by the difference, without any need for it. But I also experimented with "preparing" a copy context that I then pass to the Op#transform(CopyContext, OpTransformer) call. That also didn't work well for other reasons, and it might be not an intended use at all, but in such scenario it would matter.

Oct 20 '24 16:10 SirYwell

Not at the moment. How about we do the following? Switch to use the new implementation, ensuring tests pass. Keep the old one too and ensure it is explicitly tested, since it is a good test case. And incrementally, if needed, update the new implementation.

Okay, I'll do that. Is there are way in jtreg to run tests with different system properties? This way we could just check for the property in one of the transform methods (in one of the classes) and delegate to the other depending on what was passed. This is how I tested it locally, but running tests for both this way (where they don't have different results that are relevant to the test) might make sense. WDYT?

Yes, see here

Yes, blocks can only reference other blocks from within the same body. Do you need access to a parent context?

I mainly was confused by the difference, without any need for it. But I also experimented with "preparing" a copy context that I then pass to the Op#transform(CopyContext, OpTransformer) call. That also didn't work well for other reasons, and it might be not an intended use at all, but in such scenario it would matter.

Ok. For some wider context i am wondering if it is possible to place all the transform logic within OpTransformer. What's missing is the code to transform an input body into output operations added to a block builder. OpTransformer should only use public APIs. This may make the transformation seem less special, its just a particular composition of traversal and building with some booking between input and output code items. (Further, CopyContext is a terrible name!)

Oct 21 '24 21:10 PaulSandoz

Yes, see here

Thanks, I adjusted some tests where it seems to make sense to test both variants accordingly. Let me know if there's anything else to do there. The current tests all pass, only one needed changes due to different numbering (I wonder why we don't have a block 0 anymore?).

Ok. For some wider context i am wondering if it is possible to place all the transform logic within OpTransformer. What's missing is the code to transform an input body into output operations added to a block builder. OpTransformer should only use public APIs. This may make the transformation seem less special, its just a particular composition of traversal and building with some booking between input and output code items. (Further, CopyContext is a terrible name!)

That sounds interesting and might be worth further investigation.

Oct 24 '24 15:10 SirYwell

This looks good. I verified it fixes uninitialized pattern variables mentioned in #264.

Here's a patch that i believe fixes the algorithm to support uninitialized variables.

diff --git a/src/java.base/share/classes/java/lang/reflect/code/analysis/SSAConstruction.java b/src/java.base/share/classes/java/lang/reflect/code/analysis/SSAConstruction.java
index e5133490128..25de41e9691 100644
--- a/src/java.base/share/classes/java/lang/reflect/code/analysis/SSAConstruction.java
+++ b/src/java.base/share/classes/java/lang/reflect/code/analysis/SSAConstruction.java
@@ -70,8 +70,12 @@ private void prepare(Op nestedOp) {
                 }
                 case CoreOp.VarAccessOp.VarStoreOp store ->
                         writeVariable(store.varOp(), store.parentBlock(), new Holder(store.storeOperand()));
-                case CoreOp.VarOp initialStore ->
-                        writeVariable(initialStore, initialStore.parentBlock(), new Holder(initialStore.initOperand()));
+                case CoreOp.VarOp initialStore -> {
+                    Val val = initialStore.isUninitialized()
+                            ? Uninitialized.VALUE
+                            : new Holder(initialStore.initOperand());
+                    writeVariable(initialStore, initialStore.parentBlock(), val);
+                }
                 case Op.Terminating _ -> {
                     Block block = op.parentBlock();
                     // handle the sealing, i.e. only now make this block a predecessor of its successors
@@ -183,6 +187,7 @@ private void sealBlock(Block block) {
 
     private Value resolveValue(CopyContext context, Val val) {
         return switch (val) {
+            case Uninitialized _ -> throw new IllegalStateException("Uninitialized variable");
             case Holder holder -> context.getValueOrDefault(holder.value(), holder.value());
             case Phi phi -> {
                 List<Phi> phis = this.additionalParameters.get(phi.block());
@@ -253,6 +258,10 @@ sealed interface Val {
     record Holder(Value value) implements Val {
     }
 
+    enum Uninitialized implements Val {
+        VALUE;
+    }
+
     record Phi(CoreOp.VarOp variable, Block block, List<Val> operands, Set<Object> users) implements Val {
         Phi(CoreOp.VarOp variable, Block block) {
             this(variable, block, new ArrayList<>(), new HashSet<>());

Using IllegalStateException might be too broad in general. TestUninitializedVariable checks it encounters it, but it might be thrown by something else. The test passed without these changes because the SSA algorithm fails for different reasons, on the call to initialStore.initOperand().

Oct 25 '24 22:10 PaulSandoz

The patch looks good and simple. I assume exceptions will generally need to be refined for the whole API.

Oct 29 '24 07:10 SirYwell

/integrate

I'll try to find some time to experiment with bailing out on variables in try/catch/finally blocks where needed.

Oct 29 '24 16:10 SirYwell

@SirYwell Your change (at version 9458769a2674bfe851958249fe3ad8781ef45591) is now ready to be sponsored by a Committer.

Oct 29 '24 16:10 openjdk[bot]

/sponsor

Oct 29 '24 17:10 PaulSandoz

Going to push as commit a9fbdee81066fd5f840afd537005e04e44ceedef. Since your change was applied there have been 3 commits pushed to the code-reflection branch:

f69e732433c903febc2ef647c5459139130b9acc: Dont use hashbang for launcher scripts
cb52322017ca649c25c40b27dd9478419521f8f3: Fixed env.bash issue found on Linux (Mac ok?)
94b5149779bed3715885955a9d9677f9e9d97d21: bld script and bldr tools cleanup

Your commit was automatically rebased without conflicts.

Oct 29 '24 17:10 openjdk[bot]

@PaulSandoz @SirYwell Pushed as commit a9fbdee81066fd5f840afd537005e04e44ceedef.

:bulb: You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Oct 29 '24 17:10 openjdk[bot]

babylon babylon copied to clipboard

Experiment: Alternative SSA construction algorithm

Progress

Reviewing

Webrev

Webrevs

babylon
babylon copied to clipboard