instaparse
instaparse copied to clipboard
Amplify the recommendation to use resource files whenever escaping
This is an excellent implementation with great documentation. One thing I think, is that building rules inside a clojure source file can be a nice gameful challenge, yet tedious, and more importantly unreadable during later maintenance ― whenever there is the need to escape characters. E.g. consider this definition below, even the comment inside it requires escaping, not just the quote signs and back-slashes. It might be good to slightly more explicitly recommend, in the readme, as a rule of thumb, switching to resource files as early as the need to escape anything arise.
(def wikiextractor-parser
"a parser for the output of wikiextractor (https://github.com/attardi/wikiextractor)"
(parser
"
S = Entry*
Entry = <Header> ContentAsText <Trailer> <OptionalPadding*>
Header = '<doc' (' ' HeaderProp)* '>'
HeaderProp = #'[^=]*' '=' '\"' #'[^\"]*' '\"' (* e.g. id=\"4030\" *)
ContentAsText = Anychar*
Anychar = #'(?sm).'
Trailer = '</doc>'
OptionalPadding = #'\\s'
")
)