language-javascript
language-javascript copied to clipboard
Autosemi after comments
Hi! Currently, there is a subtle difference in the way language-javascript deals with auto semicolons when comments are present.
Let's have a look at this code sample:
function f1() {
return // hello
4
}
function f2() {
return /* hello */
4
}
function f3() {
return /* hello
*/ 4
}
function f4() {
return
4
}
And how esprima parses it:
{
"type": "Program",
"body": [
{
"type": "FunctionDeclaration",
"id": {
"type": "Identifier",
"name": "f1"
},
"params": [],
"body": {
"type": "BlockStatement",
"body": [
{
"type": "ReturnStatement",
"argument": null
},
{
"type": "ExpressionStatement",
"expression": {
"type": "Literal",
"value": 4,
"raw": "4"
}
}
]
},
"generator": false,
"expression": false,
"async": false
},
{
"type": "FunctionDeclaration",
"id": {
"type": "Identifier",
"name": "f2"
},
"params": [],
"body": {
"type": "BlockStatement",
"body": [
{
"type": "ReturnStatement",
"argument": null
},
{
"type": "ExpressionStatement",
"expression": {
"type": "Literal",
"value": 4,
"raw": "4"
}
}
]
},
"generator": false,
"expression": false,
"async": false
},
{
"type": "FunctionDeclaration",
"id": {
"type": "Identifier",
"name": "f3"
},
"params": [],
"body": {
"type": "BlockStatement",
"body": [
{
"type": "ReturnStatement",
"argument": null
},
{
"type": "ExpressionStatement",
"expression": {
"type": "Literal",
"value": 4,
"raw": "4"
}
}
]
},
"generator": false,
"expression": false,
"async": false
},
{
"type": "FunctionDeclaration",
"id": {
"type": "Identifier",
"name": "f4"
},
"params": [],
"body": {
"type": "BlockStatement",
"body": [
{
"type": "ReturnStatement",
"argument": null
},
{
"type": "ExpressionStatement",
"expression": {
"type": "Literal",
"value": 4,
"raw": "4"
}
}
]
},
"generator": false,
"expression": false,
"async": false
}
],
"sourceType": "script"
}
As you can see, the parser is introducing an automatic semicolon in all the four scenarios. This behavior is consistent with modern V8 and SpiderMonkey engines.
However, let's have a look at how language-javascript handles these scenarios. If you add the following tests to ProgramParser.hs...
describe "function with autosemi on return" $ do
it "return with inline comment" $ do
testProg "function f1(){ return //hello\n 4 }" `shouldBe` "Right (JSAstProgram [JSFunction 'f1' () (JSBlock [JSReturn ,JSDecimal '4'])])"
it "return with multiline comment" $ do
testProg "function f2(){ return /* hello */\n 4 }" `shouldBe` "Right (JSAstProgram [JSFunction 'f2' () (JSBlock [JSReturn ,JSDecimal '4'])])"
it "return with multiline comment with newline" $ do
testProg "function f3(){ return /* hello\n */ 4 }" `shouldBe` "Right (JSAstProgram [JSFunction 'f3' () (JSBlock [JSReturn ,JSDecimal '4'])])"
it "return with newline" $ do
testProg "function f4(){ return \n 4 }" `shouldBe` "Right (JSAstProgram [JSFunction 'f4' () (JSBlock [JSReturn ,JSDecimal '4'])])"
...only the fourth and last test will actually produce the expected results. The other three will fail with the same error:
test/Test/Language/Javascript/ProgramParser.hs:29:13:
1) Program parser:, function with autosemi on return, return with inline comment
expected: "Right (JSAstProgram [JSFunction 'f1' () (JSBlock [JSReturn ,JSDecimal '4'])])"
but got: "Right (JSAstProgram [JSFunction 'f1' () (JSBlock [JSReturn JSDecimal '4' ])])"
To rerun use: --match "/Program parser:/function with autosemi on return/return with inline comment/"
test/Test/Language/Javascript/ProgramParser.hs:32:13:
2) Program parser:, function with autosemi on return, return with multiline comment
expected: "Right (JSAstProgram [JSFunction 'f2' () (JSBlock [JSReturn ,JSDecimal '4'])])"
but got: "Right (JSAstProgram [JSFunction 'f2' () (JSBlock [JSReturn JSDecimal '4' ])])"
To rerun use: --match "/Program parser:/function with autosemi on return/return with multiline comment/"
test/Test/Language/Javascript/ProgramParser.hs:35:13:
3) Program parser:, function with autosemi on return, return with multiline comment with newline
expected: "Right (JSAstProgram [JSFunction 'f3' () (JSBlock [JSReturn ,JSDecimal '4'])])"
but got: "Right (JSAstProgram [JSFunction 'f3' () (JSBlock [JSReturn JSDecimal '4' ])])"
To rerun use: --match "/Program parser:/function with autosemi on return/return with multiline comment with newline/"
In other words, this parser is only inserting an automatic semicolon after a newline, but not after comments. This PR fixes this issue.
I have also refactored the Lexer tests, so that they don't just inspect the tokens produced by alex, but also those sent to happy.