gc icon indicating copy to clipboard operation
gc copied to clipboard

Support cast-free Java arrays

Open RossTate opened this issue 4 years ago • 9 comments

#120 suggests that superfluous casts can cause significant overhead for array-intensive programs. Please illustrate how the Post-MVP will eliminate these casts. For example, give the WebAssembly translation of the following:

public static void foo(Object obj) {
    Number[] nums = (Number[]) obj; // might more precisely be an Integer[] or Double[]
    int sum = 0;
    for (int i = 0; i < nums.length; ) {
        sum += nums[i].intValue; // for simplicity, suppose intValue were a field rather than a method
        nums[i] = new Integer(sum); // Integer is a subclass of Number
    }
}

RossTate avatar Sep 22 '20 22:09 RossTate

By “casts”, are you referring to the runtime checks that Java needs on array assignment? Do you imagine eliminating the checks by compiling this loop with code duplication, one copy specialised to nums being Integer[]?

If so, this doesn’t strike me as a particularly convincing example. If computational performance did matter for some numeric computations, then you probably wouldn’t be using arrays of wrapped number types for it, which are heavy in Java. It’s a fairly safe bet that object allocation would dominate this example, not runtime checks.

Also, I don’t understand how your simplification of making that a field (on Number?) is supposed to work.

Having said all that, can you clarify the actual problem you see? Can you explain why you think that the MVP would get in the way of a later extension with some form of existential typing, as you probably have in mind to enable this?

rossberg avatar Sep 23 '20 08:09 rossberg

By “casts”, are you referring to the runtime checks that Java needs on array assignment?

I am referring to the same casts I referenced in the issue I reference in the OP: #120. "Superfluous" is the term we have been using to specifically refer to casts that do not exist in the source language (or in standard compilations of the source language) but which exist in compilations to the current MVP due to its limitations in reasoning about common invariants.

Can you explain why you think that the MVP would get in the way of a later extension with some form of existential typing, as you probably have in mind to enable this?

Please see this presentation you attended a month ago, which is a recap of the discussion you and I had on this topic nearly three years ago, and which discusses this particular example (slide 28). You might also refer to the e-mail 7 months ago where I illustrated why the self types you had planned to propose at the in-person meeting would be unable to express this particular example. Or you can refer to Juan Chen and David Tarditi's POPL '07 paper that first discussed the issues with self types, among other structural approaches, and addressed them with nominal types (e.g. #119). To summarize, concerns are expressiveness, decidability, and practicality of generation.

RossTate avatar Sep 23 '20 12:09 RossTate

So is this essentially the same example as #120, with your point again being that "bounded quantification + structural typing is undecidable"?

conrad-watt avatar Sep 23 '20 12:09 conrad-watt

Roughly. #120's example was designed to test run-time design aspects, whereas this example is designed to test compile-time design aspects. Though it occurs to me that a related challenge is being able to allocate an Integer[]/Number[] array and fill it with Integer values (within the same setting, so that we can know the array's exact type) without requiring any casts.

RossTate avatar Sep 23 '20 12:09 RossTate

Maybe unfortunate to use Integer and Double as examples, since clearly anyone writing code with such types should not expect any kind of performance.

What if instead we're talking about Cat and Dog sitting in an Animal array? Then your example is accessing a field that apparently is introduced in Animal already (let's say age), and replacing the actual animal with something that is always a Cat.

Agree that at a source level this should not need casts.

Would say that doesn't sound like the best example, and be better to have a more realistic example of something a) people commonly write, b) without dynamic allocation inside the loop (i.e. something you'd expect to be fast), and c) with a type of object that you reasonably would expect to be a dynamically allocated object with subtyping.

aardappel avatar Sep 24 '20 17:09 aardappel

@aardappel Somewhat surprisingly, it does not matter if this example itself requires good performance or not. The reason is that, if the Post-MVP cannot express this example, and in particular provably maintain the invariant that a Java Foo[] contains only Foos, then no other WebAssembly code can safely assume a Java Foo[] contains only Foos because the invariant cannot be guaranteed. As such, all other code would always have to cast all values retrieved from a Java Foo[] to Foo before using them. #120 suggests that this means every array access to a Java Foo[] would incur about 48% overhead, unless Foo is declared final in which case it is about 26% overhead.

I did a quick benchmark for sorting an array of a Person class by an int "age" field, and I found these casts incurred 6% overhead if Person was final and 12% overhead if Person had subclasses. So an inability to guarantee the invariant that a Java Foo[] contains only Foos can have non-trivial effect on the performance of at least Java programs on WebAssembly, which is why I would like a demonstration that the Post-MVP can actually guarantee this invariant.

RossTate avatar Sep 26 '20 00:09 RossTate

The "closed" update above does not appear to make sense. Rebasing artifact or typo?

jakobkummerow avatar Feb 24 '21 10:02 jakobkummerow

I honestly have no idea. I did not close anything, GitHub just randomly closed some issues, not just this one. Maybe it somehow magically inferred something from the merge commit message?

That's why "smart" UIs are such a bad idea...

Edit: Yeah, I think what happened is that I merged upstream, which involved some upstream commits saying "Fixes #X" in the upstream repository. Then GH smartly interpreted those in stupid ways. =-O

rossberg avatar Feb 24 '21 11:02 rossberg

I guess it's due to the commits pulled in by the merge: the mentioned commit 66d90a7 does say "Fixes #138" in its description, but was probably referring to some other repository's issue with that number.

jakobkummerow avatar Feb 24 '21 11:02 jakobkummerow

Closing this because it is non-actionable for the MVP. I've labeled it post-MVP, though, so it's easy to find again once we start looking to do follow-on work.

tlively avatar Nov 01 '22 20:11 tlively