hyperscan
hyperscan copied to clipboard
\b with UTF-8
Hi,
is there a way in Hyperscan to find a pattern where \b is followed by a non-ASCII character (such as\bö) in the input text? According to my experience, \be matches where it should, but \bö does not. For instance,
\bematches "_ e_" and does not match "_ xe_"\bödoes not match "_ ö_", but it does "_ xö_"
I get the same result irrespective of whether I use HS_FLAG_UTF8 or not; HS_FLAG_UCP gives an error. I could not find anything about \b being incompatible with Unicode in the documentation; in fact, the only place the docs mention HS not supporting UTF8 or \b is in the approximate matching section, which is irrelevant to my use-case.
Thanks!
Can confirm, UTF-8 doesn't work for me either.