rhombus-prototype icon indicating copy to clipboard operation
rhombus-prototype copied to clipboard

Shrubbery notation

Open mflatt opened this issue 6 years ago • 98 comments

An indentation-sensitive notation that remixes elements of #lang something, Lexprs, and saplings.

Rendered

mflatt avatar Oct 01 '19 13:10 mflatt

I think the link to the RFC discussion is wrong.

Can you add something comparing it to sapling notation?

samth avatar Oct 01 '19 14:10 samth

Fixed and added. The short answer in comparison to sapling notation is that shrubbery notation is indentation-sensitive.

mflatt avatar Oct 01 '19 14:10 mflatt

See | used for if was enlightening to me! Beautiful. I think if I had thought of that, I wouldn't feel like the line continuing rule after the indented block was as necessary.

In print_sexp, the for has indentation but no suffix... I assume this is "visible", might be nice to add a comment in the first example about the indentation grouping rule

I think this a lot and am willing to move forward with this. For next steps:

  • I think that the @ notation is hopefully obvious
  • I think the #{} idea is fine, although I think it would fine to tweak C names a little bit to accommodate ? and maybe a few others
  • I am anxious about delaying precedence/etc to another level, but am willing to say that shruberies can do whatever, but the next language may have some particular system.

jeapostrophe avatar Oct 01 '19 17:10 jeapostrophe

I think forbiding . in identifiers is a good idea in spite the incompatibilities that it may cause, but nobody is using it. Probably forbiding : is also good, it may cause more problems like in exn:fail?, but help to avoid some bugs using syntax-parse. I'm not sure about !, I like it but it's a nice shorthand of not. And I really like ?.

About delaying the precedence to another level, I think it is necessary to support user-defined operator (my examples in a recent thread was the @ in python and the .* in matlab).

gus-massa avatar Oct 01 '19 18:10 gus-massa

@jeapostrophe I think it was confusing to have the "a colon could go here" comment in the middle of the first example, since that's not the first place where a colon is optional. I changed the text before the example to explain : as indentation up-front and ignore the detail that extra :s are allowed.

@gus-massa I think the C and the Lisp ends of the identifier spectrum make sense, but I'm skeptical of in-between points. With Lisp-style identifiers, you have to put space around any operator (including something like :), so that's why I lean toward C-style identifiers in an infix-friendly notation. I also lead toward C-style because I am tired of having to explain that the - in x-y or the ! in set! is just part of the name; it means the syntax has already confused my audience, and I'm explaining to try to undo that confusion. I can see how allowing ? in identifiers just might work ok with C-style syntax, because ? is not commonly used in operators (and we don't need the C-style ?...: since we're not going to make the mistake of if being a statement and not an expression). Still, ? won't look like part of a name to many programmers, and I would prefer not to spend time in the years ahead explaining that ? is just a part of an identifier's name.

mflatt avatar Oct 02 '19 13:10 mflatt

Also, Rust and JavaScript now both use ? as an operator (for related things, but not related to the C ?).

samth avatar Oct 02 '19 14:10 samth

@jeapostrophe I think it was confusing to have the "a colon could go here" comment in the middle of the first example, since that's not the first place where a colon is optional. I changed the text before the example to explain : as indentation up-front and ignore the detail that extra :s are allowed.

@gus-massa I think the C and the Lisp ends of the identifier spectrum make sense, but I'm skeptical of in-between points. With Lisp-style identifiers, you have to put space around any operator (including something like :), so that's why I lean toward C-style identifiers in an infix-friendly notation. I also lead toward C-style because I am tired of having to explain that the - in x-y or the ! in set! is just part of the name; it means the syntax has already confused my audience, and I'm explaining to try to undo that confusion. I can see how allowing ? in identifiers just might work ok with C-style syntax, because ? is not commonly used in operators (and we don't need the C-style ?...: since we're not going to make the mistake of if being a statement and not an expression). Still, ? won't look like part of a name to many programmers, and I would prefer not to spend time in the years ahead explaining that ? is just a part of an identifier's name.

I think Ruby allows ! and ?, and lots of programmers really like Ruby. Is it really the case that people are surprised by this issue?

(This is not even to talk about how people who never program before wouldn't feel ! and ? particularly weird... the whole programming is weird to them!)

sorawee avatar Oct 02 '19 14:10 sorawee

A clarification on Ruby: Ruby allows ? and ! specifically at the end of method names (not in identifiers in general, and not in other places within the identifier). It also has ! as an operator. It seems that the expression x!(1) or even x!1 is a call to the x! method with the argument 1, while x ! 1 is a call to x with the argument false (since !1 is false).

I am not aware of any formal study that would confirm whether - or ? as part of an identifier really surprises people. Maybe there has been one. Currently, I can only report my perception from my experience (working with programmers at various levels) that it does surprise them, and it requires specific explanation in a way that a or x as part of an identifier does not.

mflatt avatar Oct 02 '19 18:10 mflatt

As much as I don't like this, I agree 100% that people are constantly asking if ?, -, and ! are part of the syntax of Racket identifiers, same with ^ and % if they ever get to those.

jeapostrophe avatar Oct 02 '19 18:10 jeapostrophe

About the main part of this proposal:

I like the idea of meaningful indentation. It reduces the amount of parenthesis a lot, like in my examples in samplings. I don't use paredit, so I rely in the parenthesis to detect the mismatch. I sometimes use {} for define and [] for long begin blocks to make the detection of mismatch easier. I will need to be more carefully when editing, and it will need more editor support to move the blocks to the left/right as automatically as possible. I like this anyway.

I'd like to add the magic & for for/let in addition of the magic | for if/match and the magic : for everything. But I think this addition is quite orthogonal and I don't expect to cause too much problems, so perhaps we can delay the discussion for later.

I'm not convinced of the optional :. I prefer a strict rue of a : before every new indented block that is not a | block, and never a : before a | block. It makes the code core consistent.

I like the two spaces before the | blocks. Indenting and using the | is a little redundant, but it looks nicer.

I'm still not sure why the {} blocks must be different of the () blocks. But if they are equivalent I'm worried about the confusion of an operator that can be unary and binary an is at the begining of the line. I still can't make a good example, but something like this example. (Note the lack of the ,, display is perhaps a bad election, and I need a unitary operator that has side effects and also can be used as a binary operator so I used -- that is the nearest candidate.)

display(x
        -- y) 

display{x
        -- y} 

gus-massa avatar Oct 03 '19 14:10 gus-massa

I also lead toward C-style because I am tired of having to explain that the - in x-y or the ! in set! is just part of the name; it means the syntax has already confused my audience, and I'm explaining to try to undo that confusion.

Now that I think about it, I've probably also had to explain that dozens of times. It would be nice to just... not have to do that. I'm coming around on the idea of C-style identifiers for Rhombus.

jackfirth avatar Oct 03 '19 16:10 jackfirth

@gus-massa On the optional :, I'm sympathetic to your point. I started with the #lang something optional :, but maybe the choice there is related to how : enables indentation within (). Otherwise, I thought it might be annoying to have to delete a : when editing code in some cases. But it does seem like the reader can simply disallow : before |, {, or indentation, and providing a good error message seems straightforward, so maybe that's better.

I also share your concern about the special rule for operators to continue a line within () and [] and the way that {} is different. Within () and [], a compensating factor that we expect , to separate groups, as you say. The remaining risk of confusion seems a worthwhile trade-off to avoid \ for breaking arithmetic across lines, but I'm not 100% certain.

I'm not initially enthusiastic about &, because getting rid of it seemed like one of the two big improvements over sapling notation (where the other is getting rid of the need for blank lines). But I encourage you to try/share examples and maybe try adjusting the parser.

mflatt avatar Oct 03 '19 16:10 mflatt

+1 for required : before indented blocks.

I'm not sure I like the idea of preserving commas on the parse. I'd prefer that either:

  1. separating eg. function arguments with newlines be exactly the same as using commas
  2. comma-separated groups be a special kind of group, where commas are always required, but dropped in the parse.

When would a group that mixes use or absence of commas be desirable?

f(a, b
  c, d,
  e)

michaelballantyne avatar Oct 03 '19 17:10 michaelballantyne

@michaelballantyne Requiring commas in () and [] seems likely a good idea. I avoided that requirement originally, because I wanted to make the notation flexible — deferring when possible to a language built with the notation. But enforcing a use of commas in () and [] at the shrubbery-notation level seems consistent with the way the division of responsibility has evolved.

mflatt avatar Oct 04 '19 12:10 mflatt

I've updated the description and parser:

  • A , is now required between each group in () or []. Extra ,s are not allowed. I like how this removes the need to represent , in parsed forms.

  • A | is now implicitly indented by half a column. Although @gus-massa liked how indentation was required for nested |, it feels a little less confusing to allow a | to line up with the "keyword" that starts a conditional form, because the shape of | makes it look a little indented already. More significantly, a half-column alignment side-steps the question of mixing groups that start with | and groups that don't. And since they can't be mixed, a block that contains | groups can be simplified to an 'alts variant of 'block in parsed form.

  • A : (or |) is now required before indentation. I'm ambivalent. It often looks noisy and feels genuinely redundant to me. But a redundant : is useful as a kind of belt-and-suspenders notation to help detect earlier when indentation goes wrong — we like indentation, but don't really trust it? — and I think that's probably where the suggestion comes from. (@gus-massa and @michaelballantyne can correct me if I'm wrong.)

Extra :s are still allowed. Requiring a : before indentation is redundant and slightly pendantic, but workable. Having the parser complain when you have an extra : (because you just inserted a newline after a :, for example) seems unnecessarily pedantic. A style guide and code-formatting tool should normalize :s as well as indentation, of course.

mflatt avatar Oct 17 '19 16:10 mflatt

I really like all of these changes, because I like the greater specificity. Thank you!

jeapostrophe avatar Oct 17 '19 20:10 jeapostrophe

Are characters used as "operators" like "+", "-" not allowed to be shown in an identifier? I see some identifiers like make_adder and color_posn in the demo, and in-list in (#lang) racket is transformed to in_list.

yfzhe avatar Oct 18 '19 02:10 yfzhe

Extra ,s are not allowed.

@mflatt does that mean that trailing comma ([1, 2, 3,]) is not allowed? I find that trailing commas are really useful in two cases: 1) writing prettifier is easier, because we can just print(arg + ', '), and more importantly 2) in line-based diff, using trailing commas allows adding arguments in new lines to touch exactly only those new lines, so the diff is not noisy. So I hope that trailing commas will be allowed.

Are characters used as "operators" like "+", "-" not allowed to be shown in an identifier? I see some identifiers like make_adder and color_posn in the demo, and in-list in (#lang) racket is transformed to in_list.

@yfzhe Yeah, see https://github.com/racket/rhombus-brainstorming/pull/122#issuecomment-537505453

sorawee avatar Oct 18 '19 03:10 sorawee

So I hope that trailing commas will be allowed.

I strongly second this from the way I write Python code, where I always add trailing commas. Whenever I don't, I end up forgetting to add a comma to the formerly last row when adding a new one, which is annoying.

I have a question on how to evaluate different options. I find it difficult to even think through what goals of usability and clarity of the syntax we want to achieve - and for which subgroup of people? What are some actual studies showing what type of syntax find more natural along whichever metric? And do you have an ideal study where you'd say "I would change my mind on doing X rather than Y if you could show me that 1st-year CS students/long-time C coders after 1 day/1 month exposure with X rather than Y are happy/faster at writing syntactically correct code/can understand the meaning of written code?" Even if we don't find such studies or can't plausibly run them, it would help clarify what trade-offs we are making.

As an example, I find it annoying to write _ instead of - in identifiers. It's one of my main syntax-related frustrations when not programming in Racket -- I find it very non-ergonomic and slows my typing down where I consciously notice it (hitting and the - button). I can see why this may confuse lots of people however and require explaining it to them. However, if after explaining it once to them they understand it and are never confused, then so what? Alternatively, if they read code such as pair-vector1-vector2 as a pair - vector1 - vector2 even after some exposure to the syntax, and if we care more about readability, I would not just be happy to drop -, but in fact agree that having - in identifiers is bad. Similarly, I'd put some serious money on people being bad at parsing prefix mathematics even after quite some exposure to it -- but if you actually showed me that after 1 month of using prefix daily, people read and write it as easily as infix, I'd be surprised, but I'd be happy to grant that I am the odd one out, and I'd agree that prefix serves the goal better.

MarcKaufmann avatar Oct 18 '19 08:10 MarcKaufmann

I can see why this may confuse lots of people however and require explaining it to them

Emphasis mine. In my experience, the costs of teaching a notation dominate the costs of reading it, and both dominate the cost of writing it.

jackfirth avatar Oct 18 '19 08:10 jackfirth

That makes sense. It also highlights why my opinion as a user, who primarily writes his own code, but rarely has to reread much of it, and even less teach it, is probably tilted in the wrong direction. Which is why it's useful to spell that out - and I can be happy without '-' in identifiers.

I would add that leading to more correct and maintainable code beats initial teaching costs, and for me personally infix maths is much less likely to lead to mistakes than prefix, despite me getting both.

MarcKaufmann avatar Oct 18 '19 08:10 MarcKaufmann

The idea of allowing a trailing comma surprises me. I had not noticed the evolution on this point in other languages.

I could see allowing a trailing comma and having the standard format include a trailing comma when a closing ) or ] is written on its own line (as required in Go).

mflatt avatar Oct 18 '19 14:10 mflatt

Trailing comma support works best when there's an autoformatter involved, so it gets users better diffs without them having to actually think about where to put trailing commas and where not to.

Related, many languages now ship with a standard autoformatter. @mflatt, do you think defining some autoformatting rules as part of shrubbery notation would be a useful thing to do at this stage?

jackfirth avatar Oct 19 '19 04:10 jackfirth

I'm not sure about the last comma, but there are some example in Python anyway:

bool(0) # ==> False (it's the number 0, that is falsy)
bool(0,) # ==> False (again the number 0, the last comma is ignored)
bool((0)) # ==> False (again the number 0, unnecessary parenthesis are ignored)
bool((0,)) # ==> True (it is the tuple with one element that is zero)

Another unrelated idea for the comas is from VB. Two comas separated by space mark a optional argument of a function that is replaced by the default value.

MsgBox("Message", , "Title")  ' The second argument is missing so use the default.
                              ' It selects the select the icon and buttons to display. 

gus-massa avatar Oct 19 '19 20:10 gus-massa

About the spaces of indentation before |, I think that 1 or 2 is better than 0. I'm not sure if 1 is better than 2 or not. Using 2 look like too much indentation indeed.

The problem is the editor while you are writing the program. Imagine you type

if x>0 [enter] [tab] | ...

How many spaces should [tab] move? Is it necessary to press [tab]? With 2 spaces, [tab] always adds 2 spaces.

I think the objective of adding some syntax is to fix the shortcomings of the s-expressions. It's quite abstract, so I prefer a more concrete objective of removing the special rules for indentation and square bracket in DrRacket. It looks like this notation is good enough to eliminate the rules after the program is complete, but DrRacket will need some rules for the indentation while typing the program.

I think only the L-expressions need no rules for a complete program and no rules while typing a program, but it needs to use too many continuation characters at the end of the lines to ensure that.

gus-massa avatar Oct 20 '19 12:10 gus-massa

It turns out that parse.rkt already allows a trailing comma, because it wasn't specifically guarding against that. I'll adjust the text to say that it's allowed. @gus-massa Thanks for the VB note; I wouldn't go that way, but it's good to know about the precedent.

@jackfirth Yes, I think basic formatting rules belong at this phase. I tried to specify some already, characterizing certain patterns as "standard" style or indentation, and we can add more of that.

@gus-massa I very much agree on the goal of avoiding binding-specific indentation. As a starting point in this notation, I suggest 1 space for | indentation and 2 spaces for other indentation. For editor behavior, I had imagined that [enter] after : would indent, [enter] without : would not indent, and | would automatically (un)indent without requiring [tab]. Meanwhile, [tab] would cycle through plausible indentations (potentially unindenting). But there's probably more/better that others have worked out for Python-style syntax.

mflatt avatar Oct 20 '19 13:10 mflatt

In this syntax proposal and the others, it would be interesting to see how syntax for optional type annotations would fit in. It might be worth considering that sooner rather than later to avoid "painting oneself into a corner" when it comes to "Typed Rhombus".

PaulMorris avatar Dec 24 '19 13:12 PaulMorris

That's a good point, but @mflatt has already included some examples. See https://github.com/racket/rhombus-brainstorming/pull/122/files#diff-ef7ebf39fed9f2714f31edee8c398140R169 for instance. I think it's still interesting to see examples of annotation on non formal argument position.

sorawee avatar Dec 24 '19 14:12 sorawee

@mflatt Speaking of type annotation, is this a typo?

define exp(n: Integer, base: base = 2.718281828459045)

Probably it should be this, right?

define exp(n: Integer, base: Integer = 2.718281828459045)

sorawee avatar Dec 24 '19 14:12 sorawee

@sorawee the base argument doesn't look like an integer to me ...

samth avatar Dec 24 '19 16:12 samth