ocaml-re icon indicating copy to clipboard operation
ocaml-re copied to clipboard

[Feature Request] Unicode support

Open choeger opened this issue 11 years ago • 7 comments

At a glance, this whole library seems like a very well-thought piece of software (limited scope, defined solution). Unfortunately, it does not support unicode right now. But unicode should be the standard in this millenium. So here is my proposal: Instead of using chars and strings exclusively, abstract the library over the concrete code-point and input representations. Then someone (me) could simply extend the library by providing a suitable unicode support. I understand that this kind of abstraction might yield some performance regressions, but it would yield a whole batch of new usecases.

choeger avatar Oct 15 '14 20:10 choeger

Could D. Bunzli's Uutf be used to iterate over unicode chars? That might also help to parametrize over the input stream (string, bigarray, stream of strings, etc.) for #20 ...

c-cube avatar Dec 02 '14 09:12 c-cube

The main issue to make the implementation generic is that it is table-based. This works well when there are only 256 possible characters, but does not scale to the one million Unicode code points...

One thing that should work is to translate regular expressions defined in term of Unicode code points into regular expressions defined in term of bytes and match UTF-8 strings byte by byte.

vouillon avatar Dec 02 '14 17:12 vouillon

Any hope to have unicode supported soon ?

zoggy avatar Jan 12 '16 07:01 zoggy

I don't think @nojb or anyone else is working on it right now, but it could change if someone was motivated. ;)

Drup avatar Jan 12 '16 12:01 Drup

Surprising that it wasn't still implemented

XVilka avatar Feb 05 '18 06:02 XVilka

Someone needs to do it, and it's hard™ 🙂

c-cube avatar Feb 05 '18 14:02 c-cube

As far as I understand from the discussion in #48, the implementation there is viable and could be used as a basis for further work. I can rebase that PR against the current master, but unfortunately I am rather overloaded at the moment so cannot commit to doing the "further work" that may be necessary to get it integrated.

nojb avatar Feb 05 '18 14:02 nojb