ack3 --ignore-case does not work for ą, ę, ś, ć, ń, ó, ł, ż, ź

Not sure what level of support for unicode is expected here, but since it is supposed to be a "better grep", I'd like to be able to search for Polish words :)

$ cat test.txt
e
E
ę
Ę
a
A
ą
Ą
$ ack -i a test.txt
a
A
$ ack -i ą test.txt
ą
$ ack -i Ą test.txt
Ą

Calling it with ack -i '\p{Uppercase_Letter}' seems to match every line, but the output contains a lot of corrupted characters and I do not even know if copy&pasting here makes any sense. Similary for ack -i '\p{L}'.

I am using the latest version of ack:

ack --version
ack 2.12
Running under Perl 5.10.1 at /usr/bin/perl

Sep 01 '14 15:09 qbolec

I can reproduce this with ack 2.12 and Perl 5.20, too.

I think the match highlighting does not highlight all of the multibyte character and hence breaks it into two non-valid UTF-8 characters.

Sep 01 '14 15:09 xtaran

Unicode isn't really supported in any way (yet); here's a page on plans for 2.1, for which we've been considering Unicode support: https://github.com/petdance/ack2/wiki/Plans-for-2.1

Sep 01 '14 18:09 hoelzro

Isn't 2.1 < 2.12?

Sep 01 '14 19:09 xtaran

Not supporting Unicode might be a good thing, if a proper warning is displayed to the user. In my particular scenario I've missed several occurrences of particular string in our codebase just because they were in different case, without the ack complaining that it does not understand my query. I think it should emit some warning to STDERR like "you have some non-ascii characters in the pattern, results might be wrong".

Sep 01 '14 19:09 qbolec

@xtaran Hah, touché... Maybe we should start calling the next generation stuff 3.0?

Sep 01 '14 21:09 hoelzro

@qbolec Thanks for the suggestion; maybe we can roll that into 2.14.

Sep 01 '14 21:09 hoelzro

Alas the workaround i suggested on beyondgrep/ack2#565 doesn't apply here. ack '(?ui)ą' test gets jut a single result too (perl 5.18.2, US EN locale). However there's a slim chance that will work in your PL locale, worth a try.

Jul 30 '15 20:07 n1vux

Setting up PERL_UNICODE made this work for me:

$ PERL_UNICODE=SAD ack '(?ui)ą' test.txt
ą
Ą

Jul 30 '15 20:07 hoelzro

Aha! Awesome. With that, don't even need the (?u)

PERL_UNICODE=SAD ack -i 'ą' test
ą
Ą

(Edited to XREF beyondgrep/ack3#258 Unicode Support where make test results with PERL_UNICODE=SAD are reported.)

Jul 30 '15 20:07 n1vux

ack3 ack3 copied to clipboard

--ignore-case does not work for ą, ę, ś, ć, ń, ó, ł, ż, ź

ack3
ack3 copied to clipboard