multi-value icon indicating copy to clipboard operation
multi-value copied to clipboard

Flow Sensitivity

Open RossTate opened this issue 5 years ago • 7 comments

Here is some food for thought that might be relevant to discussions such as in #41. Low-level type systems generally need to at some point incorporate multi-value flow-sensitivity. That means that a branch taken on one value needs to be able to affect the operations you can do on other values on the context.

Here's an example for supporting Java/C# (that normally wouldn't be written this way but which would be produced by inlining):

int sum(Object arr) {
    Integer[] ints = (Integer[]) arr;
    int sum = 0;
    for (int i = 0; i < ints.length; i++) {
        Integer intref = ints[i];
        sum += intref.int_value;
    }
    return sum;
}

At the low level, this turns into:

int sum(Object arr) {
    ints = cast arr to Array;
    elem_type = ints.elem_rtt;
    assert elem_type == Integer.rtt;
    sum = 0;
    for (i = 0; i < ints.length; i++) {
        intref = ints[i];
        sum += intref.int_value;
    }
    return sum;
}

Notice that, in order for this to type check, after branching on elem_type == Integer.rtt we need to realize that this means the references contained in ints necessarily have an int_value field even though elem_type == Integer.rtt has no direct reference to ints.

Without multi-value flow-sensitivity, you can only get the following to type check:

int sum(Object arr) {
    ints = cast arr to Array;
    elem_type = ints.elem_rtt;
    assert elem_type == Integer.rtt;
    sum = 0;
    for (i = 0; i < ints.length; i++) {
        intref = cast ints[i] to Integer;
        sum += intref.int_value;
    }
    return sum;
}

That is, we have to cast every reference retrieved from ints to Integer even after asserting that its element type is the correct type. To make matters worse, if you don't have multi-value flow-sensitivity, there's no way to enforce the expectation that a surface-level Integer[] only contains Integer references, so you'll have to cast the result of int[i] even when ints has surface-level type Integer[].

(Note: all of this is independent of covariant arrays, so feel free to ignore that additional complexity when pondering this food for thought.)

So when considering design options, you might want to think of which options better support multi-value flow-sensitivity.

RossTate avatar Mar 19 '20 16:03 RossTate

How does nominal types solve this issue?

fgmccabe avatar Mar 19 '20 17:03 fgmccabe

I'm not sure how this example relates to the multivalue proposal (or WebAssembly more generally), and it seems to assume quite a bit of typing machinery that WebAssembly does not have today. Could you perhaps sketch out what features would have to be included in WebAssembly for this example to become possible and also reframe the example to use WebAssembly rather than Java?

tlively avatar Mar 19 '20 19:03 tlively

@tlively

(or WebAssembly more generally)

I just thought y'all might appreciate some foreshadowing on what kind of reasoning it will take for WebAssembly to ensure that the values from a String[] are actually Strings, like what the JVM and CLR can do. I'm not saying that we must add this reasoning—we might decide the complexity outweighs the performance gains—just letting you know what it is.

I'm not sure how this example relates to the multivalue proposal

Discussions like #41 talk about the tradeoffs between using local variables and using the stack, with the multivalue proposal making the stack much more usable. Other low-level type systems, like typed assembly languages, use reasoning more like the stack than like local variables. The reason is that the type of the stack is a snapshot in time of multiple values, whereas the type of a local variable stays fixed and only talks about one value.

To get a sense of what that means, consider the line assert elem_type == Integer.rtt;. Right before this line, the snapshot-in-time-type is "[ints : Array, elem_type : RTTI] where ints' element type equals the type represented by elem_type". Right after that line, we learn that elem_type is equal to the RTTI for Integer, so we get the snapshot-in-time-type "[ints : Array, elem_type : RTTI] where ints' element type equals the type represented by elem_type and the type represented by elem_type equals Integer". By combining these two equalities relating the two distinct variables ints and elem_type, we realize that ints is an Array whose element type equals Integer in this snapshot-in-time-type.

it seems to assume quite a bit of typing machinery that WebAssembly does not have today

Certainly. This is about looking ahead, specifically about where WebAssembly would need to go to be able to type-check the compilations of major type-safe languages without inserting many dynamic casts (if that's where it wants to go).

@fgmccabe

How does nominal types solve this issue?

Theoretically speaking, this is orthogonal to nominal vs. structural types. Instead, it's more related to existential types. Practically speaking, I don't believe there exists even a research compiler that has been able to put the relevant theory for structural types to practice on a real language. On the other hand, for nominal types there already exists an algorithm that can automatically type-check the above example without any additional type annotations within the body of sum—an algorithm that has been able to scale to automatically type-check the output of existing C# (1.0) to x86 compilers.

Regardless, I didn't bring this up as part of that debate. I just wanted to give y'all a heads up rather than have y'all later wish the issue had been mentioned before. I also wanted y'all to have a glimpse of what we meant by "fancy types", and why we opted to present y'all with an MVP that was forwards-compatible with fancy types but didn't itself require fancy types—a question that was raised during one of the meetings.

RossTate avatar Mar 20 '20 02:03 RossTate

Thanks, this makes more sense to me now :)

tlively avatar Mar 20 '20 03:03 tlively

Cool. Thanks for helping me realize how unclear I was being.

RossTate avatar Mar 20 '20 12:03 RossTate

To get a sense of what that means, consider the line assert elem_type == Integer.rtt;. Right before this line, the snapshot-in-time-type is "[ints : Array, elem_type : RTTI] where ints' element type equals the type represented by elem_type". Right after that line, we learn that elem_type is equal to the RTTI for Integer, so we get the snapshot-in-time-type "[ints : Array, elem_type : RTTI] where ints' element type equals the type represented by elem_type and the type represented by elem_type equals Integer". By combining these two equalities relating the two distinct variables ints and elem_type, we realize that ints is an Array whose element type equals Integer in this snapshot-in-time-type.

The assertion that "int's element type = the type represented by elem_type" is not representable in the notation you are using.

Makes me want to reformulate it using logical variables .. :)

-- Francis McCabe SWE

fgmccabe avatar Mar 20 '20 15:03 fgmccabe

Yeah, there are a number of ways to express this formally. The most common is constrained existential types, but there are others that might be a better fit for WebAssembly (if we go down this path). Other VMs have tackled this by baking in language primitives, so by not taking that path y'all have embarked on a fairly unworn path.

RossTate avatar Mar 20 '20 17:03 RossTate