neotomex unicode inputs are not parsable with exclusion character sets

unicode inputs are not parsable with exclusion character sets

Open brentspell opened this issue 5 years ago • 3 comments

defmodule Unparsable do
  use Neotomex.ExGrammar

  @root true

  define :root, "[^a]*"
end

Unparsable.parse("pɪˈkɑːn")

This code raises the following error:

** (MatchError) no match of right hand side value: {"ɪ", "ˈkɑːn"}
    (neotomex) lib/neotomex/grammar.ex:325: Neotomex.Grammar.match/3
    (neotomex) lib/neotomex/grammar.ex:442: Neotomex.Grammar.match_zero_or_more/4
    (neotomex) lib/neotomex/grammar.ex:157: Neotomex.Grammar.parse/2
    /Users/brent/tmp/test.exs:1: Unparsable.parse/1
    (elixir) lib/code.ex:813: Code.require_file/2
    (mix) lib/mix/tasks/run.ex:145: Mix.Tasks.Run.run/5
    (mix) lib/mix/tasks/run.ex:85: Mix.Tasks.Run.run/1
    (mix) lib/mix/task.ex:331: Mix.Task.run_task/3
    (mix) lib/mix/cli.ex:79: Mix.CLI.run_task/2
    (elixir) lib/code.ex:813: Code.require_file/2

It appears that this can be fixed by passing "u" to Regex.compile in peg.ex.

Jan 20 '20 21:01 brentspell

Hi @brentspell -- thanks for reporting this along with the fix!

By any chance would you like to open a PR with the fix and a test? If so, I'll be much more responsive with getting it merged. Otherwise, I can put the diff together -- it's definitely a useful addition.

Feb 14 '20 02:02 jtmoulia

I ended up going with nimble-parsec for this parser, but if I come across another neotomex use case, I'll do a pr for this.

Feb 14 '20 13:02 brentspell

Makes sense -- thanks!

Feb 14 '20 17:02 jtmoulia

neotomex neotomex copied to clipboard

unicode inputs are not parsable with exclusion character sets

neotomex
neotomex copied to clipboard