neotomex
neotomex copied to clipboard
unicode inputs are not parsable with exclusion character sets
defmodule Unparsable do
use Neotomex.ExGrammar
@root true
define :root, "[^a]*"
end
Unparsable.parse("pɪˈkɑːn")
This code raises the following error:
** (MatchError) no match of right hand side value: {"ɪ", "ˈkɑːn"}
(neotomex) lib/neotomex/grammar.ex:325: Neotomex.Grammar.match/3
(neotomex) lib/neotomex/grammar.ex:442: Neotomex.Grammar.match_zero_or_more/4
(neotomex) lib/neotomex/grammar.ex:157: Neotomex.Grammar.parse/2
/Users/brent/tmp/test.exs:1: Unparsable.parse/1
(elixir) lib/code.ex:813: Code.require_file/2
(mix) lib/mix/tasks/run.ex:145: Mix.Tasks.Run.run/5
(mix) lib/mix/tasks/run.ex:85: Mix.Tasks.Run.run/1
(mix) lib/mix/task.ex:331: Mix.Task.run_task/3
(mix) lib/mix/cli.ex:79: Mix.CLI.run_task/2
(elixir) lib/code.ex:813: Code.require_file/2
It appears that this can be fixed by passing "u" to Regex.compile
in peg.ex
.
Hi @brentspell -- thanks for reporting this along with the fix!
By any chance would you like to open a PR with the fix and a test? If so, I'll be much more responsive with getting it merged. Otherwise, I can put the diff together -- it's definitely a useful addition.
I ended up going with nimble-parsec for this parser, but if I come across another neotomex use case, I'll do a pr for this.
Makes sense -- thanks!