Unable to parse valid W3C EBNF
The grammar located here https://github.com/transpect/css-tools/blob/master/ebnf-scheme/CSS3.ebnf is valid W3C EBNF, as verified on railroad https://bottlecaps.de/rr/ui. This package throws an error that it could not parse the grammar at /node_modules/ebnf/dist/Grammars/W3CEBNF.js:288:19.
So it looks like there are some compatibility issues. Perhaps the grammar for W3C is out of date, given the age of the package?
Additionally, this package cannot parse the EBNF grammar that railroad shows on its site:
import { Grammars } from 'ebnf';
const w3grammar = `Grammar ::= Production*
Production ::= NCName '::=' ( Choice | Link )
NCName ::= [http://www.w3.org/TR/xml-names/#NT-NCName]
Choice ::= SequenceOrDifference ( '|' SequenceOrDifference )*
SequenceOrDifference ::= (Item ( '-' Item | Item* ))?
Item ::= Primary ( '?' | '*' | '+' )*
Primary ::= NCName | StringLiteral | CharCode | CharClass | '(' Choice ')'
StringLiteral ::= '"' [^"]* '"' | "'" [^']* "'"
/* ws: explicit */
CharCode ::= '#x' [0-9a-fA-F]+
CharClass ::= '[' '^'? ( Char | CharCode | CharRange | CharCodeRange )+ ']'
Char ::= [http://www.w3.org/TR/xml#NT-Char]
CharRange ::= Char '-' ( Char - ']' )
CharCodeRange ::= CharCode '-' CharCode
Link ::= '[' URL ']'
URL ::= [^#x5D:/?#]+ '://' [^#x5D#]+ ('#' NCName)?
Whitespace ::= S | Comment
S ::= #x9 | #xA | #xD | #x20
Comment ::= '/*' ( [^*] | '*'+ [^*/] )* '*'* '*/'`;
const rules = Grammars.W3C.getRules(w3grammar);
This also fails with throw new Error('Could not parse ' + source); at the same line and position.
Hello, Can you try ending thr document/grammar string with a line ending char?
Your Char production looks hosed:
Char ::= [http://www.w3.org/TR/xml#NT-Char]
(A URL doesn't belong in a bracket expression.)
@kjhughes that's straight from W3C
The RHS is clearly meant to be metadata / documentation, not an EBNF regex. The URL references this EBNF:
Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] /* any Unicode character, excluding the surrogate blocks, FFFE, and FFFF. */
@menduz : Just tried adding a newline at the end and that seemed to do the trick!
Might be worthwhile to not fail on no final newline character?
I've tried adding a newline and still not having any success. Also been trying to parse https://github.com/messagetemplates/grammar/blob/master/message-template.ebnf without success.
Yes, adding a new line on the end of a string is a great tip! Additionally, even though the parser only give you a yes/no as to whether is parsed successfully or not, you can quickly narrow down the problem in the playground
https://menduz.github.io/ebnf-highlighter/
by starting with just one line at a leaf or your parse tree and building your ebnf file back up from there.
e.g. does this parse?
_LETTER-OR-DIGIT ::= [A-Za-z0-9]
No. How about this?
_LETTERORDIGIT ::= [A-Za-z0-9]
No. How about now?
LETTERORDIGIT ::= [A-Za-z0-9]
Yes. So does W3C EBNF not support an NCName entity starting with an underscore? Well, let's look at the node-ebnf source code, this is the top of W3CEBNF.ts
// https://www.w3.org/TR/REC-xml/#NT-Name
// http://www.bottlecaps.de/rr/ui
// Grammar ::= Production*
// Production ::= NCName '::=' Choice
// NCName ::= [http://www.w3.org/TR/xml-names/#NT-NCName]
// Choice ::= SequenceOrDifference ( '|' SequenceOrDifference )*
// SequenceOrDifference ::= (Item ( '-' Item | Item* ))?
// Item ::= Primary ( '?' | '*' | '+' )?
// Primary ::= NCName | StringLiteral | CharCode | CharClass | '(' Choice ')'
// StringLiteral ::= '"' [^"]* '"' | "'" [^']* "'"
// CharCode ::= '#x' [0-9a-fA-F]+
// CharClass ::= '[' '^'? ( RULE_Char | CharCode | CharRange | CharCodeRange )+ ']'
// RULE_Char ::= [http://www.w3.org/TR/xml#NT-RULE_Char]
// CharRange ::= RULE_Char '-' ( RULE_Char - ']' )
// CharCodeRange ::= CharCode '-' CharCode
// RULE_WHITESPACE ::= RULE_S | Comment
// RULE_S ::= #x9 | #xA | #xD | #x20
// Comment ::= '/*' ( [^*] | '*'+ [^*/] )* '*'* '*/'
That tells us to look it up here: http://www.w3.org/TR/xml-names/#NT-NCName
click through to the Name: https://www.w3.org/TR/REC-xml/#NT-Name
click through to the NameStartChar: https://www.w3.org/TR/REC-xml/#NT-NameStartChar
Oh dear, it does look to me like you're supposed to be able to start an NCName entity with an underscore. So it does seem a shame that node-ebnf won't parse this. But hopefully what I've been able to demostrate about how I would isolate a fault and investigate the cause is helpful?