opal Fix TypeCheckingDomain

Fixes #242

Feb 03 '25 21:02 errt

/it:test

Feb 03 '25 21:02 errt

Thanks for working on this! I checked the original case where we encountered the exception (as seen in #242) and i can confirm that that particular issue seems to be fixed now. However, when performing the rountrip for all methods of the JAR i ran into other instances of the TypeCheckingDomain thinking there is dead code which (to me) is not really dead. One example is this (more context):

if (pkixParams.isExplicitPolicyRequired())
{
    ErrorBundle msg = new ErrorBundle(RESOURCE_NAME,"CertPathReviewer.explicitPolicy");
    throw new CertPathReviewerException(msg,certPath,index);
}
intersection = null;

where the OPAL exception reports PC 2039 (corresponding to intersection = null) to be dead. This is part of the exception:

2000: get org.bouncycastle.x509.PKIXCertPathReviewer.pkixParams : java.security.cert.PKIXParameters
2003: INVOKEVIRTUAL(java.security.cert.PKIXParameters{ boolean isExplicitPolicyRequired() })
2006: IFEQ(33)
2009: NEW org.bouncycastle.i18n.ErrorBundle
2012: DUP
2013: LoadString("org.bouncycastle.x509.CertPathReviewerMessages")
2015: LoadString_W(CertPathReviewer.explicitPolicy)
2018: INVOKESPECIAL(org.bouncycastle.i18n.ErrorBundle{ void <init>(java.lang.String,java.lang.String) })
2021: ASTORE(13)
2023: NEW org.bouncycastle.x509.CertPathReviewerException
2026: DUP
2027: ALOAD(13)
2029: ALOAD_0
2030: get org.bouncycastle.x509.PKIXCertPathReviewer.certPath : java.security.cert.CertPath
2033: ILOAD(10)
2035: INVOKESPECIAL(org.bouncycastle.x509.CertPathReviewerException{ void <init>(org.bouncycastle.i18n.ErrorBundle,java.security.cert.CertPath,int) })
2038: ATHROW
2039: ACONST_NULL

Here we see at PC 2003 that there is an IFEQ with offset 33, which would target PC 2039, still the exception header states pc 2039 is dead; unable to compute stack map table.

In yet another example i found, try-with-resource statements seem to be an issue:

try (InputStream resourceAsStream = Version.class.getResourceAsStream(PDFBOX_VERSION_PROPERTIES);
     InputStream is = new BufferedInputStream(resourceAsStream))
{
    Properties properties = new Properties();
    properties.load(is);
    return properties.getProperty("pdfbox.version", null);
}
catch (IOException io)
{
    LOG.debug("Unable to read version from properties - returning null", io);
    return null;
}

Here, PC 61 is reported to be dead - the code seems to me like code relating to the handling of try-with-resources, although i am not completely sure:

61: ASTORE(6)
63: ALOAD_3
64: ALOAD(6)
66: INVOKEVIRTUAL(java.lang.Throwable{ void addSuppressed(java.lang.Throwable) })
69: GOTO(7)
72: ALOAD_2
73: INVOKEVIRTUAL(java.io.InputStream{ void close() })
76: ALOAD_0
77: IFNULL(29)
80: ALOAD_1
81: IFNULL(21)
84: ALOAD_0
85: INVOKEVIRTUAL(java.io.InputStream{ void close() })

[..]

Exception Handlers:
ExceptionHandler([54, 58) -> 61, java.lang.Throwable)
ExceptionHandler([84, 88) -> 91, java.lang.Throwable)
ExceptionHandler([21, 46) -> 109, java.lang.Throwable)
ExceptionHandler([21, 46) -> 117, <FINALLY>)
ExceptionHandler([127, 131) -> 134, java.lang.Throwable)
ExceptionHandler([109, 119) -> 117, <FINALLY>)
ExceptionHandler([10, 76) -> 152, java.lang.Throwable)
ExceptionHandler([109, 152) -> 152, java.lang.Throwable)
ExceptionHandler([10, 76) -> 157, <FINALLY>)
ExceptionHandler([167, 171) -> 174, java.lang.Throwable)
ExceptionHandler([109, 159) -> 157, <FINALLY>)
ExceptionHandler([0, 106) -> 192, java.io.IOException)
ExceptionHandler([109, 192) -> 192, java.io.IOException)

Here it is worth noting that PC 61 is installed as an exception handler, but there is another handler ([10, 76] -> 152) covering the same area and catching the same exception type. I honestly do not know how AI handles this at the moment, and i did not go into any detailed debugging yet. Let me know if you need some help on this, then i would spent some time debugging those issues in-depth.

Feb 04 '25 08:02 johannesduesing

Thank you. If you find the time, any help in locating the new error would be helpful.

Feb 04 '25 10:02 errt

I'll have a look 👍

Feb 04 '25 15:02 johannesduesing

I do have some updates on this. I was able to fix the first issue from my previous comment: The problem was that doJoin implementations for subclasses of AnObjectValue all implicitly assumed that <AnObjectValue>.isNull == Unknown. Therefore, when performing a join those implementations only looked at the type bounds, not the "Nullness" of the values being joined. However, implementations like InitializedObjectValue have isNull == No, which lead to joins producing wrong results, i.e. <isNull == Unknown> join <isNull == No> produced <isNull == No>. I believe i have fixed this issue with my latest contributions, feel free to look through those changes and let me know if you think we should go differently about this.

The other issue (wich arose when handling try-with-resources statements) is not yet fixed, but i did identify the root cause. The issue is something i (for the lack of a better word) would call "finally-inlining", where a finally block is duplicated by the java compiler and is added once in a normale fashion (as an exception handler) and once it is "inlined" after the content of the try-block (i guess this is an optimization-measure?). If the inlined finally-block contains e.g. an if-statement, it may indeed be a dead code block that (in the inlined instance) is never reachable. If this dead code contains PCs that would normally require a stack frame, then we have an error where we need to compute the frame, but do not have information on locals because the code is dead. This all sounds very exotic, but regularly happens when compiling try-with-resources statements, where after desugaring try-catch blocks are placed inside a finally block. I see two ways of dealing with this:

When computing the set of PCs that need a stack frame, do some elaborate computation based on the evaluatedPCs property and try to exclude PCs that result from catch-blocks where the corresponding try is already dead, or generally resulting from conditional jumps where the jump instruction is dead. However, i do need to point out that the original bytecode does have a stack frame for those dead instructions!
Use a very stupid form of abstract interpretation domain, that visits all branches regardless of the potentiall nullness of abstract values. This would guarantee that we have information on "locals" even for instructions we indentify as dead with the current domain.

@errt Do you have a preference? Do you see any other way of going about it?

Feb 06 '25 11:02 johannesduesing

Had a glance at your changes and they seem fine, even if quite complex.

I don't have an answer for the other one. Not computing this for dead code seems logical and would probably be easier to do and more performant. On the other hand, we might have more such situations, and while the first solution would only fix this one, the second solution would probably solve it for good. Also, I'm not sure Java's bytecode verifier would even accept the bytecode if it didn't include those stack frames? Note also that the TypeCheckingDomain already is a dumb special purpose domain used only for the computation of StackMapTables, so changing this to not ignore dead code might not have too many repercussions. But I'm not sure how difficult this would be.

Feb 06 '25 12:02 errt

I'm also leaning towards the second solution .. maybe just making all object values involved have isNull = Unknown would fix the issue, that would even allow me to revert most of my changes for the other issue and keep the code simple. I'll have a look and report back to you on how complex it really is.

Feb 06 '25 12:02 johannesduesing

I reverted my previous changes and implemented a solution that simply returns Unknown for the nullness of all reference values. This fixed most issues i had when rountrip-serializing pdfbox.

One method still failed because of certain array-related operations that result from obscure unreachable code, see this example. Here, elements of a varargs parameter array are being instanceof checked for an array type that they can never be, then cast into that type and being manipulated. The AI failed here (rightfully so), as the abstarct value was not an array and thus did not support array operations. I figured this code may not make sense, and may not be reachable, but can be compiled by the Java Compiler - therefore it should probably also pass our checks in OPAL when writing bytecode. As a result, i made array operations in the TypeCheckingDomain also work on non-array types. Feel free to give me feedback on that or suggest a different solution.

After implementing the second fix, all roundtrips i tried were successful. Granted, the domain now does not do very much, it does not enforce nullness and not even type compatibility on array operations ... but if we want to take the Java compiler as a ground truth, i think this is necessary.

EDIT: I see the test are failing, i'll have a look into that tomorrow.

EDIT2: Seems like the tests are failing because the old assumptions for the TypeCheckingDomain obviously do not hold anymore. We should decide on whether or not we want my suggested changes here before i adapt the tests.

Feb 06 '25 15:02 johannesduesing

The tests failing don't seem related to the l0.TypeCheckingDomain, which worries me. They might come from the TypeLevelReferenceValues, which is a whole different beast.

Feb 06 '25 16:02 errt

I managed to fix those issues. The problem was that:

The changes to l0.TypeLevelReferenceValues propagated via inheritance to l1.ReferenceValues
Some tests define local domain classes that now inherited changed behavior from L0

My current solution is to override the changed behavior from L0 wherever it is necessary. This is not the most elegant solution though, and i have not yet seen the results of IT tests. Maybe it would be better to revert L0 back to the way it was (plus your initial fix for fields) and create a dedicated domain class only for computing stack frame tables - as the things we changed here (running all branches, accepting all objects for array operations, etc) are only relevant to this one particular use-case. What do you think?

Feb 07 '25 10:02 johannesduesing

/it:test

Feb 07 '25 10:02 johannesduesing

create a dedicated domain class only for computing stack frame tables

Which is exactly what the TypeCheckingDomain is

Feb 07 '25 11:02 errt

I was about to say "that's not its only purpose, as L1 and some tests depend on it", but i realize now the others only depend on particular traits that are mixed into the TypeCheckingDomain 🤔 So if i move my stack-frame-related overriden functionality into TypeCheckingDomain instead of the actual traits, there should (hopefully) be little-to-no fallout for any other domain or test. right? I'll try that and revert unncessary changes if successful.

Feb 07 '25 11:02 johannesduesing

/it:test

Feb 07 '25 12:02 johannesduesing

Are those thrown exceptions necessary? Wouldn't it be sensible to just do nothing in that case, since the code must be dead?

Feb 07 '25 12:02 errt

Yes they are necessary, because otherwise the AI will not visit the catch-block for a try when the try contains array operations on non-array values. Again, this sounds very exotic, but may happen when you have varargs methods that perform nonsensical typecasts on their parameter, like the example mentioned in my code comment. Here, we have an array access being performed on a value that is clearly of type Builder and therefore not an array. However, if this was wrapped in a try-catch we should still be able to visit the catch block, as we need to generate stack frames for every single exception handler.

Feb 07 '25 13:02 johannesduesing

The integration test failure indicates that there are now some illegal values after performing AI on the JRE classes. The test requires that we can access the computationalType for all values after AI, and that seems to not be the case currently - because some values are now TheIllegalValue, which does not have a computational type. Honestly i am unsure whether or not this is expected behavior with the changes to array operations and null handling, maybe we can have a look at it together next week.

Feb 07 '25 13:02 johannesduesing

Before changing the expected values in the integration tests, i will have a look into the code / stack map generation itself: It might be possible to do this without locals or to infer a default value.

Apr 30 '25 09:04 johannesduesing

/it:test

Jun 10 '25 12:06 johannesduesing

/it:test

Jun 12 '25 14:06 johannesduesing

This PR now seems to be working. The issue was that now we can have situations where the AI performs array operations (i.e. loads) on the null value (previously this did not happen because null-checks were respected). The null value, however, is not an array value, which lead to my implementation returning java.lang.Object values even when there would have been primitive types in the array. I fixed this by always returning the correct primitive value when operations like FALOAD null <index> are performed.

This leaves one edge case: AALOAD null <index> tries to load an arbitrary reference value from an array. In particular, it could be another array or any object value. To faithfully model this i tried to create a new class that maybe is an object value and maybe is an array value, but definetely is a Reference Value. Unfortunately i did not get this to work, as the semantics of ref.isArray require an IsSArrayValue, and expliticly state that isArray == Yes <=> isNull == No - this simply does not work in the current TypeCheckingDomain. My current solution is to always return an ObjectValue when AALOAD null <index> is performed. I made every array operation resilient enough to work even if the arrayref is an ObjectValue - as mentioned before, primitive types will be returned as expected, and AALOAD ObjectValue <index> will return another ObjectValue.

Do you think this approach is sensible @errt ?

EDIT: While writing this i figured that the ObjectValue that i return whenever my operand is not an array type should not behave like a normal java.lang.Object. Especially, when joined with an actual value (either array or object) it should always take on the value that it is being joined with. Does it make sense to use some marker object like this?

private object ArrayOrObjectValue extends DefaultSObjectValue(ClassType.Object) {
        override def doJoin(pc: ValueOrigin, other: Value): Update[Value] = {
            other match {
                case ArrayOrObjectValue => NoUpdate
                case SObjectValueLike(_) | MObjectValueLike(_) | AnArrayValue(_) => StructuralUpdate(other)
                case _ => super.doJoin(pc, other)
            }
        }
    }

Jun 13 '25 07:06 johannesduesing