gc
gc copied to clipboard
Support cast-free Java arrays
#120 suggests that superfluous casts can cause significant overhead for array-intensive programs. Please illustrate how the Post-MVP will eliminate these casts. For example, give the WebAssembly translation of the following:
public static void foo(Object obj) {
Number[] nums = (Number[]) obj; // might more precisely be an Integer[] or Double[]
int sum = 0;
for (int i = 0; i < nums.length; ) {
sum += nums[i].intValue; // for simplicity, suppose intValue were a field rather than a method
nums[i] = new Integer(sum); // Integer is a subclass of Number
}
}
By “casts”, are you referring to the runtime checks that Java needs on array assignment? Do you imagine eliminating the checks by compiling this loop with code duplication, one copy specialised to nums being Integer[]?
If so, this doesn’t strike me as a particularly convincing example. If computational performance did matter for some numeric computations, then you probably wouldn’t be using arrays of wrapped number types for it, which are heavy in Java. It’s a fairly safe bet that object allocation would dominate this example, not runtime checks.
Also, I don’t understand how your simplification of making that a field (on Number
?) is supposed to work.
Having said all that, can you clarify the actual problem you see? Can you explain why you think that the MVP would get in the way of a later extension with some form of existential typing, as you probably have in mind to enable this?
By “casts”, are you referring to the runtime checks that Java needs on array assignment?
I am referring to the same casts I referenced in the issue I reference in the OP: #120. "Superfluous" is the term we have been using to specifically refer to casts that do not exist in the source language (or in standard compilations of the source language) but which exist in compilations to the current MVP due to its limitations in reasoning about common invariants.
Can you explain why you think that the MVP would get in the way of a later extension with some form of existential typing, as you probably have in mind to enable this?
Please see this presentation you attended a month ago, which is a recap of the discussion you and I had on this topic nearly three years ago, and which discusses this particular example (slide 28). You might also refer to the e-mail 7 months ago where I illustrated why the self types you had planned to propose at the in-person meeting would be unable to express this particular example. Or you can refer to Juan Chen and David Tarditi's POPL '07 paper that first discussed the issues with self types, among other structural approaches, and addressed them with nominal types (e.g. #119). To summarize, concerns are expressiveness, decidability, and practicality of generation.
So is this essentially the same example as #120, with your point again being that "bounded quantification + structural typing is undecidable"?
Roughly. #120's example was designed to test run-time design aspects, whereas this example is designed to test compile-time design aspects. Though it occurs to me that a related challenge is being able to allocate an Integer[]
/Number[]
array and fill it with Integer
values (within the same setting, so that we can know the array's exact type) without requiring any casts.
Maybe unfortunate to use Integer
and Double
as examples, since clearly anyone writing code with such types should not expect any kind of performance.
What if instead we're talking about Cat
and Dog
sitting in an Animal
array? Then your example is accessing a field that apparently is introduced in Animal
already (let's say age
), and replacing the actual animal with something that is always a Cat
.
Agree that at a source level this should not need casts.
Would say that doesn't sound like the best example, and be better to have a more realistic example of something a) people commonly write, b) without dynamic allocation inside the loop (i.e. something you'd expect to be fast), and c) with a type of object that you reasonably would expect to be a dynamically allocated object with subtyping.
@aardappel Somewhat surprisingly, it does not matter if this example itself requires good performance or not. The reason is that, if the Post-MVP cannot express this example, and in particular provably maintain the invariant that a Java Foo[]
contains only Foo
s, then no other WebAssembly code can safely assume a Java Foo[]
contains only Foo
s because the invariant cannot be guaranteed. As such, all other code would always have to cast all values retrieved from a Java Foo[]
to Foo
before using them. #120 suggests that this means every array access to a Java Foo[]
would incur about 48% overhead, unless Foo
is declared final
in which case it is about 26% overhead.
I did a quick benchmark for sorting an array of a Person
class by an int
"age" field, and I found these casts incurred 6% overhead if Person
was final
and 12% overhead if Person
had subclasses. So an inability to guarantee the invariant that a Java Foo[]
contains only Foo
s can have non-trivial effect on the performance of at least Java programs on WebAssembly, which is why I would like a demonstration that the Post-MVP can actually guarantee this invariant.
The "closed" update above does not appear to make sense. Rebasing artifact or typo?
I honestly have no idea. I did not close anything, GitHub just randomly closed some issues, not just this one. Maybe it somehow magically inferred something from the merge commit message?
That's why "smart" UIs are such a bad idea...
Edit: Yeah, I think what happened is that I merged upstream, which involved some upstream commits saying "Fixes #X" in the upstream repository. Then GH smartly interpreted those in stupid ways. =-O
I guess it's due to the commits pulled in by the merge: the mentioned commit 66d90a7 does say "Fixes #138" in its description, but was probably referring to some other repository's issue with that number.
Closing this because it is non-actionable for the MVP. I've labeled it post-MVP, though, so it's easy to find again once we start looking to do follow-on work.