parsimonious
parsimonious copied to clipboard
Support \n etc. more easily
It's awkward to express LFs, CRs, etc. in grammars, because Python tends to replace them with actual newlines, which are no-ops. It works in the grammar DSL's grammar because they're wrapped in regexes, but that shouldn't be required. Ford's original PEG grammar supports \n\r\t'"{}\ and some numerics. We should probably go that way.
But come on, you will end up reinventing it anyway. Just like it was with / precedence.
Yep, I want to have Ford's, or at least a superset of it.
:+1:
Is there a workaround for parsing newlines that is better than just escaping the newline character?
There might be some escaping dance you can do to get it into a Literal, or you can do what I do in grammar.py and stick it in a regex:
comment = ~r"#[^\r\n]*"
What is the current recommended way to match \n?
After much fooling around I was able to C-style multiline comments working with the following
comment = ws* ~r"/\*.*?\*/"s ws*
ws = ~r"\s*"i
Is there an easier way?
That looks correct and concise. You could probably make it faster by using inverted character classes. In general, non-greedy quantifiers like *? are slow because they create a lot of backtracking. Instead you could try something like this (which matches double-quoted strings with backslash escapes) for speed:
~"u?r?\"[^\"\\\\]*(?:\\\\.[^\"\\\\]*)*\""is
Sorry about all the backslashes. Anyway, notice how I scan quickly ahead for anything that couldn't possibly be an ending quote or a backslash, using [^\"\\\\]*, then go looking for actual special things with the (?:\\\\.[^\"\\\\]*)*. Of course, it's not nearly as readable as your spelling.
Thanks, that's definitely worth knowing. I did some benchmarking to see how much comments are costing in processing time.
I started with an 85 measure bass part I'd recently transcribed that had multiple comments amounting to 38% of the total characters in the file. I made it into two larger benchmark files -- one with and one without comments -- by replicating the original 20 times. So that's 1700 measures of music -- more or less equivalent to a score in all parts for a small orchestral movement.
$ wc benchmark.tbn nocommentbenchmark.tbn
1342 13132 49229 benchmark.tbn
880 8760 30400 nocommentbenchmark.tbn
The processing time, including midi file creation, on my 2012 Mac Mini was ~6.5 seconds in either case. That's about 4 ms per measure. The processing overhead for the comments was just over 2%. I think I can live with that :-)
$ time tbon -q nocommentbenchmark.tbn
Processing nocommentbenchmark.tbn
Created nocommentbenchmark.mid
real 0m6.572s
user 0m6.405s
sys 0m0.163s
$ time tbon -q benchmark.tbn
Processing benchmark.tbn
Created benchmark.mid
real 0m6.717s
user 0m6.547s
sys 0m0.166s
Great! Benchmarking is always the best answer. :-)