craftinginterpreters icon indicating copy to clipboard operation
craftinginterpreters copied to clipboard

Design discussion: Optimizations

Open nocturn9x opened this issue 2 years ago • 3 comments

Rationale

So, in my implementation of Lox (bytecode compiler) I'm making a lot of changes: I've implemented a triple-pass compiler where one step is tokenization and AST generation, the second one is AST optimization (constant folding, warning emissions, etc) and the third step is bytecode compilation, which is later serialized to a byte stream that can be dumped to a file, pretty much like .pyc files in Python. The parser is basically a version of jlox's parser on steroids, as I find recursive-descent top-down parsing absurdly elegant and a better fit for a not-so-toy-language-anymore compared to Pratt parsing (which I personally find messy). Since I'm planning to add lots of complex features to my own Lox (coroutines, generators, multithreading, an import system, a more flexible type system and most importantly improved declaration semantics), I started thinking about a bunch of optimizations I could make and I'd like this community's feedback on some of them.

Optimizing LoadConstant (OP_CONSTANT_LONG in lox) for binary expressions

Since it's known that binary expressions always emit 2 LoadConstant instructions, I thought of adding a LoadConstants instructions which loads 2 constants on the stack at a time (or maybe even a generic one with a third argument telling it how many constants to load and their indexes?)

Optimizing expression statements

Expression statements are expressions that discard their value at the end of evaluation. Their value is of course needed on the stack for in-between evaluation, but the result is immediately popped off of it which made me think to implement an EvalPop (I'm accepting suggestions for the name) opcode that automatically pops the value of whatever it evaluated off the stack without needing 2 opcodes.

Optimizing global name resolution

My implementation allows for what I call declaration specifiers. Basically, var x; is equivalent to private static var x;. The first one is the visibility specifier and its function is pretty obvious: a name declared as private is module-local, while when a name is declared with the public visibility specifier it is exported to the outside world when the module declaring it is imported by others. The second keyword is a name resolution specifier: it tells the compiler how to resolve the declared name, or more specifically it tells the compiler if such name should be resolved statically (which is fast but will cause the compiler to error out in a situation where a name is used before the unit of code where it is semantically declared) or dynamically (using a plain' old hash table at runtime and emitting a string constant, which is slower because it requires checking if the name is in the hash table and to perform hashing and collision resolution, but is definitely more flexible for things like mutually recursive functions, as the books points out). It's important to note that my language also allows const declarations (where only constant types such as numbers and strings can be used) and that name resolution specifiers don't apply to those as they are replaced at compile-time as if they were a simple C macro (i.e. const x = 5; is equivalent to #define x 5 in C). Another note is that name resolution specifiers are redundant in local variables and dynamic var x; is illegal inside functions (what would that mean anyway?), as they are always resolved statically. I'm not sure lox's scoping rules allow for a clean implementation of this optimization, but I think it's possible: after all, most global variables could be resolved statically and if they can't (because they don't have stack semantics) just declare them as dynamic and you're good to go. This IMHO makes a good compromise between performance and usability, as it allows flexibility and performance as needed (which is kind of the whole compromise that bytecode VMs make in the first place, isn't it?)

Stuff that doesn't fit anywhere else

I've tried experimenting with implementing the OP_CONSTANT_LONG opcode as suggested by the book, but found that the performance regression was significantly higher with 2 different opcodes rather than with just OP_CONSTANT having a 24 bit operand, so my impl. only uses that

nocturn9x avatar Nov 16 '21 12:11 nocturn9x

I've tried experimenting with implementing the OP_CONSTANT_LONG opcode as suggested by the book

For me, I tried implementing OP_CONSTANT_1, OP_CONSTANT_2, OP_CONSTANT_3, OP_CONSTANT_4 based on value of index.

heinthanth avatar Nov 24 '21 01:11 heinthanth

@nocturn9x

I've tried experimenting with implementing the OP_CONSTANT_LONG opcode as suggested by the book, but found that the performance regression was significantly higher with 2 different opcodes rather than with just OP_CONSTANT having a 24 bit operand, so my impl. only uses that.

Is this because of the added checks to see if you've exceeded 255?

Curious as to where you've gotten with all of this. I want to do some of the things you're suggesting myself. How much effort was it to create bytecode from your AST? I too prefer the recursive descent parser, but am not sure it's worth the effort to rewrite so much of Part III of the book.

chrisjbreisch avatar Jul 03 '22 20:07 chrisjbreisch

@nocturn9x

I've tried experimenting with implementing the OP_CONSTANT_LONG opcode as suggested by the book, but found that the performance regression was significantly higher with 2 different opcodes rather than with just OP_CONSTANT having a 24 bit operand, so my impl. only uses that.

Is this because of the added checks to see if you've exceeded 255?

Curious as to where you've gotten with all of this. I want to do some of the things you're suggesting myself. How much effort was it to create bytecode from your AST? I too prefer the recursive descent parser, but am not sure it's worth the effort to rewrite so much of Part III of the book.

Well, I kinda moved on from making a dynamic language. Right now I'm working on a statically typed compiler and it's coming along pretty nicely. There has been some major overhaul however, so expect significant changes compared to lox's original code. Creating bytecode from the AST isn't in itself difficult: the compiler is basically the same thing as the parser (i.e. it had many of the same handlers) except they take AST nodes in and spit bytecode instead of taking tokens in and spitting AST nodes

nocturn9x avatar Jul 03 '22 20:07 nocturn9x