mesabox
mesabox copied to clipboard
sh: switch to LALRPOP?
I would vastly prefer to use something like LALRPOP or pest
instead of nom
, but right now neither support arbitrary bytes as input. There is this issue for LALRPOP and this one for pest
that deal with that problem, but I'm not sure if it's fine to just say that invalid UTF-8 paths can't be given directly as arguments for now (they'd still work if a glob was given where the expanded part was invalid UTF-8). Switching to either LALRPOP or pest
should help improve the error messages and make it simpler to fix some of the current parsing issues.
As a side note, this is actually the same issue that prevented me from just using pest
in the first place.
I think it's OK to say that "invalid UTF-8 paths can't be given directly" for now and use LALRPOP or pest (pest looks good since it uses PGE for parsing). Then, let's explain our usages to the pest/LALRPOP authors and push the implementation of the "matching bytes" feature.
So, I ended up rewriting the parser without a library using iterators (retrieved from OsStr
) directly. Currently we use EncodeWide
on Windows and std::slice::Iter<u8>
on Unix. While I haven't tested on Windows yet, I believe this solution should let the parser function on Windows. We still need to figure out how to deal with forking on Windows (e.g. for subshells), but it should be possible to at least enable some sort of shell script checker on all platforms now.
I imagine the performance can be vastly improved as I just did a simple port from nom
. If this proves too difficult, we can switch to LALRPOP/Pest at a later date (once they at least support parsing u8
streams).
The rewritten parser is in the sh-parser
branch. The REPL doesn't support multi-line inputs right now, but for the most part things seem to function better than before. I am at this point considering just integrating the changes into master
as (IMO) a good REPL is less important than good scripting support.
BTW the code from sh-parser
has now been added to master
.
OK, good. Let's use the handwritten parser first and consider LALRPOP/Pest later (if we cannot handle it properly).