IL2CPU icon indicating copy to clipboard operation
IL2CPU copied to clipboard

[WIP] IL2CPU IL-level Optimization

Open ascpixi opened this issue 1 year ago • 12 comments

This pull request implements the base building blocks for IL-level optimization, alongside basic optimizations. The IL would normally be optimized with a JITter, but, as IL2CPU is an AOT compiler, IL-level optimizations can be made.

Optimization passes to implement:

  • [x] Property inlining
  • [ ] Method inlining
  • [ ] Loop unrolling
  • [ ] Control flow reordering
  • [ ] Redundant instruction elimination

This is a work-in-progress pull request.

ascpixi avatar Oct 14 '22 14:10 ascpixi

What kind of benifits will this have? Faster running speeds, smaller compile size, faster compile, etc

terminal-cs avatar Oct 14 '22 16:10 terminal-cs

What kind of benifits will this have? Faster running speeds, smaller compile size, faster compile, etc

Smaller compile size and faster running speeds. The compile times will increase, but I plan to address this in a future PR; for now, if you need fast compilation, you can simply disable optimization from your build profile

ascpixi avatar Oct 14 '22 16:10 ascpixi

alrighty, what kind of performance gains will there be to expect? anything significant?

terminal-cs avatar Oct 14 '22 16:10 terminal-cs

alrighty, what kind of performance gains will there be to expect? anything significant?

As new passes get added, you can expect quite sizable performance gains - for now, there is only a direct property inline pass that drastically improves performance when using properties - it removes the need for the CPU to jmp to a memory address, meaning that the pipeline does not get cleared; the performance is the same as if you would use a field, because the IL instruction call is directly replaced with stfld/stsfld.

Other features that are planned, such as method inlining and control flow reordering, will boost performance even more. Method inlining will avoid jmps altogether, which will drastically improve performance in loops, and control flow reordering will prioritize the branch that will most likely get called - reducing the number of jumps in that scenario as well.

ascpixi avatar Oct 14 '22 17:10 ascpixi

This article covers a good portion of the optimizations that a JIT would normally perform (and, in our case, the Optimizer class, as we lack a JIT).

ascpixi avatar Oct 14 '22 17:10 ascpixi

Method inlining. yes, yes, yes, yes 1000 times yes

this would lets up speed up the current canvas with little work

zarlo avatar Oct 15 '22 04:10 zarlo

This is a great PR! The approach looks very sensible for now. Regarding optimization a very big improvement would be to figure out when we actually need to push + pop to the stack and not only keep it in the registers.

quajak avatar Oct 15 '22 17:10 quajak

what about compile times and sizes, how will those be effected?

terminal-cs avatar Oct 25 '22 04:10 terminal-cs

what about compile times and sizes, how will those be effected?

Compile times will be extended, as the compiler will need to perform extra passes for each method. Depending on the complexity of the pass, it can take the compiler anywhere from a millisecond to a full second to process a method. For example, if the method has a lot of calls that can be inlined, the InlineMethodsPass (has not yet been commited) will need to perform local analysis and instruction correction for each inlined method call.

This is why there are additional passes like InlineDirectPropertiesPass that will inline every direct property without the need for any method analysis, reducing the load on InlineMethodsPass, which will perform a (relatively) more complex method analysis routine.

As for binary size, this is may vary depending on the set of optimization passes you'll be using. Method inlining will introduce a few more bytes to the final binary, but redundant instruction elimination should balance that out. IIRC, IL2CPU already only compiles in methods that it will need as it uses a scanner. The reason why the final kernel binary is so big is because Cosmos initializes a large majority of devices for you, even if you're not going to use them; so, for example, the network driver will be initialized outside of your kernel code, meaning you don't really have a choice whether it will include it or not.

As an example, the CAI can be used, as it's extensible and allows kernel authors to choose to enable it or not. Compile a kernel that doesn't reference any CAI classes, and then search for AudioBuffer in the assembly file IL2CPU creates; you'll find that no references to such class exists. After adding an audio card initialization routine, and re-compiling, you'll notice that these references get created.

In the cases mentioned previously, the optimizer can't really help you, as it can't simply take code out that it knows a part of the kernel uses; not only would that be dangerous, but also that burden shouldn't lie on the compiler at all. A solution would be to do a refactor of all drivers whose initialization can be delegated to public API methods (like the CAI).

TL;DR: this comment.

ascpixi avatar Oct 25 '22 17:10 ascpixi

It is also big because each plug that may not be used is included.

MishaTy avatar Oct 25 '22 20:10 MishaTy

Are all plugs included? They get scanned but I would expect only the required plugs to actually emitted.

quajak avatar Oct 31 '22 01:10 quajak

This PR is currently inactive, but as new IL optimizers have come out as of late, there is a possibility to use such a project (like DistIL) and reduce the amount of work we would need to do under IL2CPU - optimization itself can introduce a lot of buggy behavior, so a lot of upkeep would be required to keep this stable (or, at least, stable for IL2CPU standards).

However, I won't close this PR, as it's not confirmed if these projects would be suitable for IL2CPU - it might be the case that writing an external IL optimizer, suited for IL2CPU, but not directly associated/exclusive to it, would be the best option here.

If anyone wants to take over this PR, let me know, as currently I'm occupied with operating system development with NAOT research.

ascpixi avatar Jan 11 '23 20:01 ascpixi