wren icon indicating copy to clipboard operation
wren copied to clipboard

Compiler signature re-work

Open mhermier opened this issue 3 years ago • 34 comments

This patch set propose to rework how Signature works.

The key changes in behavior:

  • SignatureType is removed, so it simplify a lot of logic by storing the subscriptArity, parenArity, isInitializer and isSetter instead. It contains some simplifications to signatureToString that should be very handy to implement #1112.
  • Compilation errors for setters should provide better error messages.
    class Foo {
      static foo = (a) { a }
    }
    
    if (!Foo.foo = 42) {
    }
    
    outputs:
    Error at '=': The setter syntax cannot be used in this context.
    
    instead of:
    Error at '=': Expect ')' after if condition.
    Error at ')': Expect end of file.                      (or any other trailing garbage error)
    
  • The last (optional) 8 patches, reduces the code size in a one liner and reduce some frictions in the languages:
    • Allow trailing commas for any argument/parameter lists (add symmetry with lists and maps):
      class Foo {
        [a,] { ... }
        bar(a,) { ... }
      }
      foo.bar(a,)
      foo[a,]
      
    • Allow function of arity 0:
      Fn.new {|| null }
      
    • Allow subscript of arity 0:
      foo[]
      foo[] = 42
      
    • Allow setters of any arity:
      class Foo {
        static [i] = () { ... }
        static [i] = (a) { ... }
        static [i] = (a, b) { ... }
        static [i, j,] = (a, b,) { ... } // support trailing in subscript and value parameter list
      }
      Foo[a] = ()
      Foo[a] = 42
      Foo[a] = (42)
      Foo[a] = (42, 69)
      Foo[a, b,] = (42, 69,) // and support trailing in subscript and value argument list
      

mhermier avatar Oct 21 '22 19:10 mhermier

Updated, for functions of 0 arity that where also enabled as Fn.new {| | ....}. Added a small fixup for the compact version Fn.new {|| ....}.

mhermier avatar Oct 21 '22 23:10 mhermier

Updated: found another leaked symbol from the compiler, while tinkering on other things.

mhermier avatar Oct 23 '22 11:10 mhermier

Updated: Some minor renaming of functions.

mhermier avatar Oct 24 '22 12:10 mhermier

As far as allowing functions with zero arity is concerned, am I right in thinking that there would be no essential difference between:

var f = Fn.new { || null }

and

var f = Fn.new { null } 

In effect the first would be a variation of the second and you'd call them both with f.call().

However, I'm struggling to get my head around an indexed property with zero arity such as Foo[] which would be a distinct entity from anything we have already. I can't think of any obvious use for this - you might as well just use an ordinary property. Do you have some practical usage in mind or are you just trying to make indexed properties completely general?

PureFox48 avatar Oct 24 '22 20:10 PureFox48

For your first point, it is equivalent (for now). Thought with the new system, it would be possible to distinguish between them and all Fn.call as a getter. That said I don't think it is wise to do so unless there is a compelling need for it and break existing code.

For the subscript change, I already proposed it a while ago, but is now a easy by product of the simplification of how all the list are handled. It simplifies the syntax, by making objects callable as functors (without the strange arity restriction, or the need to use the .call() convention). The idea was revived to me because of the Array implementation, and I thought it was a bummer that we could not instantiate a template class using [] instead of requiring to use an helper method... If name generation and memoïzation of class become a thing, we can somehow simulate a template class as with the neat syntax:

class Set {
  static [] { this[Object] }
  static [clazz] { ... }
}
var ObjectSet = Set[]
var NumSet = Set[Num]

One neat idea would be to move some of the boiler plate to the compiler, so we can have the syntax:

class Set[clazz = Object] {
}

but I don't know how hard it would be to add default arguments without a proper AST. And in addition, we would need to support it done for functions and methods in the same time, so for now it is a no go but we can produce at least an ersatz of it.

mhermier avatar Oct 24 '22 21:10 mhermier

Updated: Make it finishList also handle list and map litterals. I will probably add an optionalList abstraction so an element list can automatically be parsed based on the presence of the left token type.

mhermier avatar Oct 25 '22 01:10 mhermier

Thanks for clarifying those two points.

I agree that it would not be a good idea to have a function getter since, unlike classes, functions don't have any persistent internal state (they just use global state) and it would break tons of exsting code in any case.

It hadn't occurred to me that one could use an indexed property with zero arity as a default for other indexed properties but, yes, I can see the sense in that now. Your idea of using static indexed properties to create a sort of template class is also interesting.

PureFox48 avatar Oct 25 '22 20:10 PureFox48

Well technically functions have a state because they are technically closure. So they can have private states, hidden from the global environment. So technically, I think it would make sense to have a getter function that would honor the singleton pattern. That said, unless there is more solid ground I don't consider it an ultimate reason to break code.

Well this rule is stupid in the first place (even if C++23 only adopted more than one operator). While semantically it has a different meaning, it makes absolutely no reason to differentiate it from a function call operator. I simple term, it is a language rule for the sake of a rule because of some historical usage and connotation. Nothing in the grammar rules forbids the extension, if we are willing to ignore the connotation. I mean what ever the language, you can't prevent users from inventing DSL. If using [] makes sense to them why put some arbitrary restriction and make life harder for the user. I mean a user can always act on one argument and using to pass the list of arguments, and spread it manually at the list allocation cost:

class Foo {
  static call()        { "0-spread()" }
  static call(a)       { "1-spread(%(a))" }
  static call(a, b)    { "2-spread(%(a), %(b))" }
  static call(a, b, c) { "3-spread(%(a), %(b), %(c))" }
}

class SubcriptSpreader {
  construct bind(callable) {
    _callable = callable
  }

  [args] {
    var count = args.count
    if (count == 0) return _callable.call()
    if (count == 1) return _callable.call(args[0])
    if (count == 2) return _callable.call(args[0], args[1])
    if (count == 3) return _callable.call(args[0], args[1], args[2])
  }

}

var s = SubcriptSpreader.bind(Foo)
System.print(s[[]])              // expect: 0-spread()
System.print(s[["a"]])           // expect: 1-spread(a)
System.print(s[["a", "b"]])      // expect: 2-spread(a, b)
System.print(s[["a", "b", "c"]]) // expect: 3-spread(a, b, c)

And this is true for every function/method overload in existence...

mhermier avatar Oct 25 '22 21:10 mhermier

Whilst functions in Wren are closures, they close over variables defined outside their scope. Even if all a parameterless function did was to return a closed over variable, I don't think a user would regard this as analogous to a 'getter' method returning a field value from a class.

But, minor quibbles aside, I do think this proposal as a whole hangs together well and I would therefore be in favor of it even if (as I said at the time) I'm not keen on #1112, preferring the simplicity of overloading methods purely on arity.

PureFox48 avatar Oct 26 '22 08:10 PureFox48

In some near future this change set will also relax a little the grammar of attributes (because of code reuse). It means attribute group will be able to be empty, and attributes group will be able to be ',' terminated like every other lists in the language with this changeset.

mhermier avatar Oct 26 '22 10:10 mhermier

Updated, changes are a little bit more atomic, added type hinting support.

mhermier avatar Oct 26 '22 12:10 mhermier

As far as 'type hinting' is concerned, if I'm understanding that correctly, you've just laid the groundwork for:

  1. including a colon and then a type after method or function parameters; and

  2. including the 'crow's foot' operator and then a type after the parameter section (if any) to represent the return type of a method or function.

These hints will be ignored by the compiler and we'll need to agree 'best practice' for what type should be shown particularly as far as sum types, lists, maps etc. are concerned.

Although it's not a built-in type, one thing I'd personally like to see in such hints is the use of 'Int' to represent a Num which is an integer given how common this situation is.

PureFox48 avatar Oct 26 '22 14:10 PureFox48

For your last concern, we can always add var Int = Num somewhere to satisfy the fact that the type hint even if discarded should be a valid expression to something. (it is the same case as subclassing where it should be computed to something, and handled/discarded at runtime)

mhermier avatar Oct 26 '22 14:10 mhermier

Just a further point on the type hinting.

All the discussion I've seen so far has used a colon (rather than a crow's foot) to introduce the return type as well as the parameter type(s).

Having said that, Python uses the crow's foot for the return type though that may be because they use a colon to introduce the function block.

Any particular reason why you prefer the crow's foot? I'd have thought myself that it would be better to keep the number of new symbols needed to a minimum.

PureFox48 avatar Oct 26 '22 16:10 PureFox48

I was not able to reuse : because of the setter syntax (with the new relaxed grammar):

class Foo {
  foo = b : Int {}
}

Does it means b is Int or foo = (b) returns Int, so another operator was needed. I looked at what other languages do quickly (C++ and Jai) and mimicked what was existing. But I'm open for suggestions.

On the side note, I'm having some small difficulties to finish patch-set, so it will take me some time to produce some good quality changes. I had to fix some errors related to the GC with WREN_DEBUG_GC_STRESS to test memory issues. This one is almost properly sorted, thought some cruft is left related to attributes. But that lead me too look at attributes, and oh boy... The attribute implementation tries too hard and produce/use a lot of unnecessary junk. Instead, I think we should bypass the constant system responsible of the memory usage reduction (since we can't store a Map as a key of a Map). That way we can simply store the compiler attribute maps directly in the constant pool, instead of serializing them in opcode to recreate them at execution. I have done a changeset for that but I need to clean/polish it.

mhermier avatar Oct 27 '22 01:10 mhermier

TBH I'd missed that a relaxed grammar for setters was part of the proposal.

Wouldn't it be better for the parameter list, however many there are, to always be enclosed in parentheses? That would be consistent with the current syntax, easy to remember and would mean that we could then use a colon to introduce the return type hint.

PureFox48 avatar Oct 27 '22 08:10 PureFox48

From a grammatical point of view, only the = is required to distinguish it is a setter. So the parenthesis are only syntaxic sugar to make it looks like other method declarations. Enforcing them make the code less fluent and makes only really sense in the generalization of arbitrary number of arguments. That consideration allowed some code factorization, so I went for it.

The same could be said for binary operators, and could have the same treatment of optional parenthesis (That makes me realize I missed type hinting for them, but this trivial to implement since the infrastructure is now available).

Here are some file size numbers on my machine:

       text    data     bss     dec     hex filename
vanilla:
-O2   42591    3088       0   45679    b26f wren_compiler.o (ex lib/libwren.a)
-Os   28343    3088       0   31431    7ac7 wren_compiler.o (ex lib/libwren.a)
-O0   40391    3104       0   43495    a9e7 wren_compiler.o (ex lib/libwren.a)

local branch without type hinting and attribute optimization:
-O2   42617    3344       0   45961    b389 wren_compiler.o (ex lib/libwren.a)

local branch without attribute optimization:
-O2   43001    3408       0   46409    b549 wren_compiler.o (ex lib/libwren.a)

local branch with attribute optimization:
-O2   42097    3408       0   45505    b1c1 wren_compiler.o (ex lib/libwren.a)
-Os   28336    3408       0   31744    7c00 wren_compiler.o (ex lib/libwren.a)
-O0   41475    3400       0   44875    af4b wren_compiler.o (ex lib/libwren.a)

src/vm/wren_compiler.c | 1006
  1 file changed, 584 insertions(+), 422 deletions(-)

Considering all the moving parts my change set brings, some of security/language improvement it brings (and probably some missed optimization opportunities), I consider that non explosion of size by factorization quite successful for now.

mhermier avatar Oct 27 '22 10:10 mhermier

Yep, the file size numbers look good but would presumably be even better if you didn't have to support two syntaxes (with and without parentheses) for setters.

Even if they're not really needed and it's more typing, I'd personally prefer parentheses to be mandatory both on consistency grounds and because of the type hint issue.

There's also #849 to consider which I wouldn't be surprised to see implemented in some shape or form. If it is, then the most likely syntax (as it's short and sweet) would be: foo = _foo though that might not sit too well with setters which have no parentheses after the =.

PureFox48 avatar Oct 27 '22 11:10 PureFox48

Updated the header, to mention the relaxed setter declaration syntax.

In the facts, I would go the other way around and would consider generalizing Foo.foo bar as Foo.foo(bar) (in definition and usage). It creates a friction point between function and maps, but it should be nice. This is far from the point of this change-set, but I would Introduce the '' token back so we can use it to declare a closure/lambda/function and line continuation (because the . line continuation does not scale to all binary operator, I think line continuation or a statement separator are better in that sense).

About the #489 I don't think the syntax foo = _foo is viable. Thinking at it again, it does not allow independent setter or getter definitions.

mhermier avatar Oct 27 '22 11:10 mhermier

Well, I think the point of #849 was to be able to condense:

class Entity {
  construct (pos) {
    _pos = pos
  }

  pos { _pos }
  pos=(v) { _pos = v }
}

into just:

class Entity {
  construct (pos) {
    _pos = pos
  }

  pos = _pos
}

The current way of doing things would still be available for read only or write only properties or where you needed to do more than just get or set a field. So I think it is a viable proposal even if I personally would probably not use it much.

PureFox48 avatar Oct 27 '22 11:10 PureFox48

Incidentally (and going slightly off topic), I made an even more condensed proposal in #912 for simple classes which only had read-only fields (tuples in effect) where you'd get a default toString method thrown in:

class Entity(pos)

though it wasn't met with much enthusiasm :(

Nowadays, I just create named tuples dynamically using the Meta module.

PureFox48 avatar Oct 27 '22 11:10 PureFox48

While still beeing off topic, I'm really considering to borrow most of the code you made public in rosettacode and making it a proper standard library of some sort...

mhermier avatar Oct 27 '22 12:10 mhermier

Well, you're very welcome to do that :)

I'm afraid there's no separate documentation for the various modules though I've tried to describe each method etc. within the code itself.

PureFox48 avatar Oct 27 '22 12:10 PureFox48

While doing some crazy things with "template" programming, I felt into a trap while trying to emulate some method overload. And I think, I need to also allow something that would be named like subscript caller. Basically it means to allow:

class Foo {
  static [a, b, ...](c, d, ...) { }
}

Luckily, it should be pretty simple to implement, with my current change set. Thought, there is a little friction with the function syntax.

Basically, by making = optional, it makes the following calls to [a, b, ...](c) a little confusing:

Foo[a, b, ...] 42     // c is 42
Foo[a, b, ...] { }    // c is a map or a function depending how we consider things
Foo[a, b, ...] () { } // c is a function

I think it should be a map as per Foo[a, b, ...] = { } is a map, but I need to think deeper about it. And likewise, trailing function argument Foo[a, b, ...]=(c, d, ...) { } should allowed for setter arguments...

This simplification rabbit hole is so deep... lets sip some tea at the very bottom Alice.

mhermier avatar Oct 29 '22 08:10 mhermier

Now, that I wrote it maybe it is non sense. I need to sort things and maybe simply ban Foo[a, b, ...] 42 and make Foo[a, b, ...] { } a function argument. Maybe it would make the things simpler to understand.

mhermier avatar Oct 29 '22 08:10 mhermier

If I were you, I wouldn't try and make = optional. Even if it could be made to work, it will make Wren look like a pseudo-functional language which I don't think will go down well with a lot of people.

PureFox48 avatar Oct 29 '22 09:10 PureFox48

Can you expand your thought. I'm not sure to understand what you mean. In particular, by:

I wouldn't try and make = optional.

Do you mean, I should not allow the [a, b, ...](c, d, ...) syntax entirely ?

mhermier avatar Oct 29 '22 09:10 mhermier

I mean that I'd allow foo[a, b, ...] = (c, d, ...) as you originally intended in your opening post but I'd make the = mandatory.

PureFox48 avatar Oct 29 '22 09:10 PureFox48

Well it is not nice in the case of what I'm trying to achieve. So maybe it will go in a late patch-set to make it not mandatory.

Anyway a lot of people are fools to not think the language is not pseudo-functional already. It is only that instead of using () it use[].

mhermier avatar Oct 29 '22 09:10 mhermier

Well, I'll grant you that it's pseudo-functional already in the sense that functions are first class and we have map, where and reduce in the standard library. This seems to be the trend nowadays with what are basically procedural or OO languages and is a good thing IMO.

But I think you can go too far down that path and, when you start making things optional, it means you have to make a decision each time you use them which option to go for.

PureFox48 avatar Oct 29 '22 09:10 PureFox48