ack3
ack3 copied to clipboard
--ignore-case does not work for ą, ę, ś, ć, ń, ó, ł, ż, ź
Not sure what level of support for unicode is expected here, but since it is supposed to be a "better grep", I'd like to be able to search for Polish words :)
$ cat test.txt
e
E
ę
Ę
a
A
ą
Ą
$ ack -i a test.txt
a
A
$ ack -i ą test.txt
ą
$ ack -i Ą test.txt
Ą
Calling it with ack -i '\p{Uppercase_Letter}' seems to match every line, but the output contains a lot of corrupted characters and I do not even know if copy&pasting here makes any sense.
Similary for ack -i '\p{L}'.
I am using the latest version of ack:
ack --version
ack 2.12
Running under Perl 5.10.1 at /usr/bin/perl
I can reproduce this with ack 2.12 and Perl 5.20, too.
I think the match highlighting does not highlight all of the multibyte character and hence breaks it into two non-valid UTF-8 characters.
Unicode isn't really supported in any way (yet); here's a page on plans for 2.1, for which we've been considering Unicode support: https://github.com/petdance/ack2/wiki/Plans-for-2.1
Isn't 2.1 < 2.12?
Not supporting Unicode might be a good thing, if a proper warning is displayed to the user. In my particular scenario I've missed several occurrences of particular string in our codebase just because they were in different case, without the ack complaining that it does not understand my query. I think it should emit some warning to STDERR like "you have some non-ascii characters in the pattern, results might be wrong".
@xtaran Hah, touché... Maybe we should start calling the next generation stuff 3.0?
@qbolec Thanks for the suggestion; maybe we can roll that into 2.14.
Alas the workaround i suggested on beyondgrep/ack2#565 doesn't apply here.
ack '(?ui)ą' test gets jut a single result too (perl 5.18.2, US EN locale).
However there's a slim chance that will work in your PL locale, worth a try.
Setting up PERL_UNICODE made this work for me:
$ PERL_UNICODE=SAD ack '(?ui)ą' test.txt
ą
Ą
Aha! Awesome. With that, don't even need the (?u)
PERL_UNICODE=SAD ack -i 'ą' test
ą
Ą
(Edited to XREF beyondgrep/ack3#258 Unicode Support where make test results with PERL_UNICODE=SAD are reported.)