llst Type inference prototype based on static analysis

Type inference prototype based on static analysis

Open 0x7CFE opened this issue 8 years ago • 1 comments

This PR is my first rather independent (or naïve?) attempt to realize how type inference may be used to aid JIT VM. Underlying concept is not based on any well-known theory like Hindley-Milner type system or Martin Löf's Intuitionistic type theory. It's the result of a pure meditation on Smalltalk bytecodes.

Surprisingly enough, Smalltalk being fully dynamic in it's nature is still very regular in terms of it's memory access and control flow. This really helps when we try to perform static analysis.

Current implementation concentrates on the method temporaries and gives up completely when it faces object fields. However, I believe that in local context it is still possible to infer the object's fields to fully unlock further analysis and optimizations, like stack allocation, GC root elimination, and of course TBAA.

Current implementation is powerful enough to infer self assigning temporaries within a loop and even in complex closure contexts.

For example, the following method is inferred completely:

testInference |sum|
    sum <- 0.
    1 to: 100 do:
        [ :x | sum <- sum + x ].
    ^sum

Here analyzer proves that sum variable will have SmallInt type in it's scope which spans across the follwing methods: Undefined>>testInference, Number>>to:do:, Block>>value: and finally, the block Undefined>>testInference@8. Integer overflow is currently undefined.

This allows IR generator to encode operations on sum directly as i32 without any worries about GC or dynamic dispatch.

Current inference scheme highly uses method monomorphisation and specialization, which helps to perform calculations at compile time.

For example, this code is reduced to a single literal value (the result) at compile time:

fibonacci: n
    n < 3 ifTrue: [ ^1 ].
    ^ (self fibonacci: n - 2) + (self fibonacci: n - 1)

Type inference works even in case of recurring contexts and correctly solves chicken or egg problem. Consider the listing of the Collection>>sort:

sort: criteria | left right mediane |
    (self size < 2) ifTrue: [^self].

    mediane <- self popFirst.

    left  <- List new.
    right <- List new.
    self do: [ :x |
        (criteria value: x value: mediane)
            ifTrue:  [ left  add: x ]
            ifFalse: [ right add: x ] ].

    left  <- left  sort: criteria.
    right <- right sort: criteria.

    right add: mediane.
    ^ left appendList: right

This is suboptimal implementation of the Quicksort algorithm used here only for testing purposes. It performs two recursive calls to Collection>>sort: when sorting left and right sublists. The main difficulty here is to infer the return value of the Collection>>sort:. For example, analysis of the left refers to the outer context which at that point is not inferred completely, hence no return type available.

However, current implementation correctly solves the problem using a fact, that every recursion has it's base which should be evaluated prior to the next recursive call. By propagating the base return type from the outer context to the inner one we succeed in the whole process. See the log for an example of such inference. You may also see the resulting call graph as rendered svg or graphviz source.

Static analysis along with runtime statistics and polymorphic method cacheing provides enough information for effective dispatch of the Smalltalk code.

See also: #56, #58.

The implementation is not complete yet, there are a lot of things to do:

[x] Implement inference of non-literal recursive messages, such as Collection>>sort:
[ ] Implement inference of composite types one per entry
[ ] Detect block escaping and avoid inference of the captured variables
[ ] Detect situations where inference will produce invalid results
[ ] Detect correct number of passes for guaranteed inference
[ ] Detect when method specialization is redundant and unnecessary
[ ] Solve the situation when phi incoming maps to a node from a dead path
[x] Type inference walker should use breadth-first traverse
[ ] Analysis for block invocation kind in closure sites
[ ] Inference across phi nodes
[x] Memory management, block inference caches, …
[ ] Implement inference of the remaining primitives
[x] Use inferred types in actual code generation
[ ] Take care of non-static objects
[ ] Tests for all of this

Jun 18 '16 12:06 0x7CFE

Coverage decreased (-12.7%) to 48.582% when pulling 34c1048600b974b902fe21e6079baa329e1c1d5d on feature/17/type_inference into ff4d76d30584adaa20a309ca2487122cb51479c6 on develop.

Aug 13 '16 17:08 coveralls

llst llst copied to clipboard

Type inference prototype based on static analysis

llst
llst copied to clipboard