zenscript icon indicating copy to clipboard operation
zenscript copied to clipboard

Compiler Implementation

Open keean opened this issue 8 years ago • 492 comments

Discussion about the ongoing implementation of the compiler.

keean avatar Sep 27 '16 11:09 keean

The parser now has an implementation of the indentation aware parser described in this paper: https://pdfs.semanticscholar.org/cd8e/5faaa60dfa946dd8a79a5917fe52b4bd0346.pdf

Here's the implementation of the indentation parser:

    function IndentationParser(init) {
        this.indent = init
    }
    IndentationParser.prototype.get = function() {
        return this.indent
    }
    IndentationParser.prototype.set = function(i) {
        this.indent = i
    }
    IndentationParser.prototype.relative = function(relation) {
        var self = this
        return Parsimmon.custom((success, failure) => {
            return (stream, i) => {
                var j = 0
                while (stream.charAt(i + j) == ' ') {
                    j = j + 1
                }
                if (relation.op(j, self.indent)) {
                    self.indent = j
                    return success(i + j, j)
                } else {
                    return failure(i, 'indentation error: ' + j + relation.err + self.indent)
                }
            }
        })
    }
    IndentationParser.prototype.absolute = function(target) {
        var self = this
        return Parsimmon.custom((success, failure) => {
            return (stream, i) => {
                var j = 0
                while (stream.charAt(i + j) == ' ') {
                    j = j + 1
                }
                if (j == target) {
                    self.indent = j
                    return success(i + j, target)
                } else {
                    return failure(i, 'indentation error: ' + j + ' does not equal ' + target)
                }
            }
        })
    }
    IndentationParser.prototype.eq  = {op: (x, y) => {return x == y}, err: ' does not equal '}
    IndentationParser.prototype.ge  = {op: (x, y) => {return x >= y}, err: ' is not equal or greater than '}
    IndentationParser.prototype.gt  = {op: (x, y) => {return x > y}, err: ' is not greater than '}
    IndentationParser.prototype.any = {op: (x, y) => {return true}, err: ' cannot fail '}

This is what a parser using these new parser combinators looks like:

    block = Parsimmon.succeed({}).chain(() => {
        var indent = Indent.get()
        return Parsimmon.seqMap(
            Indent.relative(Indent.gt).then(statement),
            (cr.then(Indent.relative(Indent.eq)).then(statement)).many(),
            (first, blk) => {
                blk.unshift(first)
                Indent.set(indent)
                return {'blk' : blk}
            }
        )
    })

This parses a block of statements, the first line of the block must be more indented than the previous line, and the remaining lines must be indented the same amount as the first line.

keean avatar Sep 27 '16 12:09 keean

@keean I will catch up with you later on the parser combinator implementation. I haven't employed them ever, so I will need to dedicate some time to that. My first priority is to write the grammar into an EBNF file and check that it is conflict-free, LL(k), and hopefully also context-free. I read that parser combinators can't check those attributes.

Also I will want to understand whether using a monadic parser combinator library, forces our AST into a monadic structure and whether that is the ideal way for us to implement. Any way, you are rolling on implementation, so I don't want to discourage you at all. I will try to rally around one way and help code. I will need to study. My focus so far has been on nailing down the syntax and early design decisions. Btw, congrats on getting rolling so quickly on the implementation!

Btw, I hate semicolons. Any particular reason you feel you need to litter the code with them? There are only a very few ASI gotchas in JavaScript (and these I think can be checked with jslint) with not including semicolons and these are easy to memorize, such as not having the rest of the line blank after a return as this will return undefined.

Also I prefer the style of this latest code compared to what I saw before, because I don't like trying cram too many operations on one LOC. It makes it difficult to read the code IMO.

Also, I think I would prefer to employ arrow functions as follows (we'll be porting to self-hosted later so we'll have arrow functions as standard to any ES version and to compromise at 3 spaces indentation (even though I prefer 2 spaces lately):

block = Parsimmon.succeed({}).chain(() => {
   var indent = Indent.get()
   return Parsimmon.seqMap(
      Indent.relative(Indent.gt).then(statement),
      (cr.then(Indent.relative(Indent.eq)).then(statement)).many(),
      (first, blk) => {
         blk.unshift(first)
         Indent.set(indent)
         return {'blk' : blk}
      } 
  )
})

Also I would horizontally align as follows because I love pretty code, which is easier to read:

IndentationParser.prototype.eq  = {op: eq(x, y) => {return x == y}, err: ' does not equal '              }
IndentationParser.prototype.ge  = {op: ge(x, y) => {return x >= y}, err: ' is not equal or greater than '}
IndentationParser.prototype.gt  = {op: gt(x, y) => {return x >  y}, err: ' is not greater than '         }
IndentationParser.prototype.any = {op: gt(x, y) => {return true  }, err: ' cannot fail '                 }

I may prefer:

IndentationParser.prototype.eq  = { op: eq(x, y) => {return x == y},
                                   err: ' does not equal '              }
IndentationParser.prototype.ge  = { op: ge(x, y) => {return x >= y},
                                   err: ' is not equal or greater than '}
IndentationParser.prototype.gt  = { op: gt(x, y) => {return x >  y},
                                   err: ' is not greater than '         }
IndentationParser.prototype.any = { op: gt(x, y) => {return true  },
                                   err: ' cannot fail '                 }

Above you are implicitly making the argument again that we should have the ability to name inline functions (without let) in our programming language. Note this would be an alternative solution to the ugly syntax for the case where we need to specify the return type, but afaics we can't unify around (x, y) => x == y without the prefixed name unless we don't use parenthesis for anonymous product (tuple) types and remain LL(k). Any idea how ES6 is parsing their arrow functions? LR grammar? Alternatively you would be writing that in our language:

let eq(x, y) => x == y
let ge(x, y) => x >= y
let gt(x, y) => x >  y
let gt(x, y) => true
IndentationParser.prototype.eq  = {op: eq, err: ' does not equal '              }
IndentationParser.prototype.ge  = {op: ge, err: ' is not equal or greater than '}
IndentationParser.prototype.gt  = {op: gt, err: ' is not greater than '         }
IndentationParser.prototype.any = {op: gt, err: ' cannot fail '                 }

Which would have helped you catch the error on the duplication of the gt name copy+paste typo. The only reason you are adding the redundant naming above is for browser debugging stack traces correct?

Or (unless we change the syntax):

IndentationParser.prototype.eq  = { op: x y => x == y,
                                   err: ' does not equal '              }
IndentationParser.prototype.ge  = { op: x y => x >= y,
                                   err: ' is not equal or greater than '}
IndentationParser.prototype.gt  = { op: x y => x >  y,
                                   err: ' is not greater than '         }
IndentationParser.prototype.any = { op: x y => true,
                                   err: ' cannot fail '                 }

shelby3 avatar Sep 27 '16 14:09 shelby3

The main reason to use function is backwards compatibility, not all browsers support => yet.

With regards to our syntax, function definition should be an expression, so you should be able to include it inline in the object declaration. I think we would end up with something like this:

data Relation = Relation { op : (A, A) : Bool, err : String }

let eq = Relation { op: eq(x, y) => x == y, err: ' does not equal ' }

keean avatar Sep 27 '16 15:09 keean

@keean wrote:

The main reason to use function is backwards compatibility, not all browsers support => yet.

I know. That is why I wrote:

Also, I think I would prefer to employ arrow functions as follows (we'll be porting to self-hosted later so we'll have arrow functions as standard to any ES version

I had already explained we will get backwards compatibility for free, and by not putting function we are more compatible with the way it will be written in our language when we port over.

Who can't run our compiler in a modern browser in the meantime? This is only alpha.

Please re-read my prior comment, as I added much to the end of it.

shelby3 avatar Sep 27 '16 15:09 shelby3

Regarding semi-colons, Douglas Crockford in "JavaScript: The Good Parts" recommends always using semi-colons explicitly because JavaScripts semi-colon insertion can result in the code not doing what you intended.

keean avatar Sep 27 '16 15:09 keean

I think you are right about '=>' for functions, as it is running in Node which supports them, however, I don't think porting will be that straightforward, as we won't directly support prototypes etc.

keean avatar Sep 27 '16 15:09 keean

@keean wrote:

because JavaScripts semi-colon insertion can result in the code not doing what you intended.

Did you not read what I wrote?

There are only a very few ASI gotchas in JavaScript (and these I think can be checked with jslint) with not including semicolons and these are easy to memorize, such as not having the rest of the line blank after a return as this will returnundefined.

http://benalman.com/news/2013/01/advice-javascript-semicolon-haters/

shelby3 avatar Sep 27 '16 15:09 shelby3

Regarding semi-colons:

... the specification is clear about this. JavaScript’s syntactic grammar specifies that semi-colons are required. Omitting semi-colons results in invalid JavaScript code. That code won’t throw (thanks to ASI), but it’s still invalid.

keean avatar Sep 27 '16 15:09 keean

Semicolons won't help you here:

return
   some long shit;

You have to know the rules, whether you use semicolons or not. That is why I am happy we are going to use a Python style indenting.

Semicolons are training wheels that don't protect against every failure.

shelby3 avatar Sep 27 '16 15:09 shelby3

Also jshint wants you to put them in, and I am using jshint as part of the build process.

jshint catches the above error :-)

keean avatar Sep 27 '16 15:09 keean

JSHint can be configured to allow ASI. And I think it will still warn you about ambiguous implicit cases, if I am not mistaken (it should).

shelby3 avatar Sep 27 '16 15:09 shelby3

without semi-colons JSHint cannot recognise the above error because you might mean:

return;
some long stuff

or

return some long stuff;

keean avatar Sep 27 '16 15:09 keean

Bottom line is you have something at the start of the line which could possibly be a line continuation, then check to make sure you have made it unambiguous.

That is the simple golden rule and it applies whether using semicolons or not. That is not complicated. One simple rule.

shelby3 avatar Sep 27 '16 15:09 shelby3

JavaScript was never designed to be used without semi-colons... lets design our new language not to require them, but I don't see any point in fighting JavaScript... We will emit the semi colons into JS :-)

keean avatar Sep 27 '16 15:09 keean

@keean wrote:

without semi-colons JSHint cannot recognise the above error because you might mean:

It should be warning that the case is ambiguous. I can't be faulted for the JSHint programmers being derelict (if they are, did not confirm).

shelby3 avatar Sep 27 '16 15:09 shelby3

@keean wrote:

JavaScript was never designed to be used without semi-colons...

The intent isn't relevant. What is, is what is relevant. We need to know the rules whether we use semicolons or not. We are not stupid programmers who need to make ourselves feel we are more secure by not knowing the rules. I lost my training wheels 30 years ago.

Otherwise we need to find a linter that warns of all ambiguous cases with or without semicolons.

Bottom line is you have something at the start of the line which could possibly be a line continuation, then check to make sure you have made it unambiguous.

That is the simple golden rule and it applies whether using semicolons or not. That is not complicated. One simple rule.

If JSHint isn't doing that checking, then it is derelict. Need to find a better linter.

Wonder if Douglas Crockford ever considered that. Some influential people decide that semicolons every where is the prescribed way, then why the heck did JS offer ASI any way?

Perhaps he could have realized that the only sure way, is to have a linter which properly warns of every ambiguous case, whether using semicolons or not. Instead perhaps these talking heads influenced the JSHint people to not add proper checking for the ASI case? Sigh.

shelby3 avatar Sep 27 '16 15:09 shelby3

So here's what the guy that created JS thinks: https://brendaneich.com/2012/04/the-infernal-semicolon/

keean avatar Sep 27 '16 15:09 keean

It doesn't matter. It is just logic.

There you go. Cockford doesn't agree to support ASI in his tool and thus promulgates that ASI is an error:

Some argue that JSMin has a bug. Doug Crockford does not want to change JSMin, and that’s his choice.

That's right:

And (my point here), neither is newline.

Know the rules. Newline is not a statement nor expression terminator in JavaScript. Simple as that. Resolve all ambiguous cases.

Analogous superfluous redundancy as one wouldn't write ;;;;;;; at the end of every statement or expression to make sure they got it right. They also don't need to write ; to make sure they got it right, if they are using a linter which can warn them whether the preceding expression on the prior line could be joined to the next line and thus that a semicolon or other syntax needs to be inserted to resolve the ambiguity.

shelby3 avatar Sep 27 '16 15:09 shelby3

The moral of this story: ASI is (formally speaking) a syntactic error correction procedure. If you start to code as if it were a universal significant-newline rule, you will get into trouble. A classic example from ECMA-262:

keean avatar Sep 27 '16 15:09 keean

So I don't write code with syntactic errors... I write Python without I write C++ with... it doesn't bother me, I go with what the language standard says...

keean avatar Sep 27 '16 15:09 keean

The moral of this story: ASI is (formally speaking) a syntactic error correction procedure. If you start to code as if it were a universal significant-newline rule, you will get into trouble. A classic example from ECMA-262:

Then why did he put it in JavaScript. Linters should do their job correctly.

There is absolutely no reason you ever need a ; after a block { }. The } terminates the statement or expression.

shelby3 avatar Sep 27 '16 15:09 shelby3

I wish I had made newlines more significant in JS back in those ten days in May, 1995. Then instead of ASI, we would be cursing the need to use infix operators at the ends of continued lines, or perhaps \ or brute-force parentheses, to force continuation onto a successive line. But that ship sailed almost 17 years ago.

keean avatar Sep 27 '16 15:09 keean

I wish I had made newlines more significant in JS back in those ten days in May, 1995. Then instead of ASI, we would be cursing the need to use infix operators at the ends of continued lines, or perhaps \ or brute-force parentheses, to force continuation onto a successive line. But that ship sailed almost 17 years ago.

There you go. JS can't require semicolons. So why do you? Probably because we can't use a proper (non-derelict) linter, probably because JSHint probably doesn't warn of all ambiguities with 'ASI' enabled (but I didn't confirm that).

We are moving to block indenting to avoid this entire mess.

shelby3 avatar Sep 27 '16 15:09 shelby3

Okay, so conclusion, I will use '=>' for anonymous functions, but leave the ';' in for now...

Our language won't require semi-colons, just like Python does not...

keean avatar Sep 27 '16 15:09 keean

I go with what the language standard says...

The language standard says ASI is a supported feature. You have to know the rules whether you use semicolons or not. I will not repeat this again.

Let's try to find a linter which isn't brain-dead.

shelby3 avatar Sep 27 '16 15:09 shelby3

My two cents: be careful not to use ASI as if it gave JS significant newlines. And please don’t abuse && and || where the mighty if statement serves better.

The standard says it is a syntax error to omit the semi-colon.

keean avatar Sep 27 '16 15:09 keean

The standard says it is a syntax error to omit the semi-colon.

Then why does it compile. Brendan told you that JS can not require semicolons every where because it breaks other things.

shelby3 avatar Sep 27 '16 15:09 shelby3

You are right. If you really can't work with the semi-colons, I will get rid of them for this project.

keean avatar Sep 27 '16 16:09 keean

Can I delete the semi-colon discussion, as its cluttering the implementation thread... I am going to remove them.

I discovered this does not work inside => defined functions... that is a bit weird.

keean avatar Sep 27 '16 16:09 keean

I don't understand why deleting my disagreement helps. You have your freedom to do it your way and I have my freedom to speak my logical and engineering disagreement.

ASI is a feature of JavaScript that can't be removed for the reason Brendan (its creator) explained. Ostensibly Douglas Cockford and other (corrupt?) people in the standards process get together and decide that they are too lazy to write tools that fully support the language (including apparently JSHint's derelict failure to offer even a flag to report ambiguous cases of ASI?), so they decide to decree that ASI is a syntax error when in fact it is required by the language and does in fact not generate a syntax error. A programmer could accidentally insert an ASI case, and the language will not generate a syntax error. It is entirely derelict and a basterdization of the language. Brendan even agrees with me if you read carefully what he is saying. He has no choice but to go along with the community of derelictness, because it is a political game. Look what happened to Brendan recently because of his personal politics. Corrupt world we live in and I will not be a supporting member of the corruption by following illogical decrees and basterdization of what is. Note I can't accuse any person or organization of corruption, because I don't know that to be the case. Can just be human nature and design-by-committee outcomes. ASI is part of JavaScript, regardless of some meaningless words inserted after the fact by some standards committee. And tools that don't support the language fully are derelict.

I am a rebel. And I will remain one. But of course, I will do that is reasonable after registering my disagreement.

Look how successful the drive to not support ASI has been. The world's most popular language apparently doesn't even have an open source linter that is compliant with the language? I am an engineer, not a politician.

I register my disagreement with cow-tailing to derelict refusal to adhere to the features of the language. My point to them is remove ASI from the language if they don't want tools to fully support it. Instead of using deception and politics to influence a lack of support from tools.

Proofs by appeal to authority are less convincing to me than proofs of engineering facts.

shelby3 avatar Sep 27 '16 16:09 shelby3