ack3 icon indicating copy to clipboard operation
ack3 copied to clipboard

--ignore-case does not work for ą, ę, ś, ć, ń, ó, ł, ż, ź

Open qbolec opened this issue 11 years ago • 9 comments

Not sure what level of support for unicode is expected here, but since it is supposed to be a "better grep", I'd like to be able to search for Polish words :)

$ cat test.txt
e
E
ę
Ę
a
A
ą
Ą
$ ack -i a test.txt
a
A
$ ack -i ą test.txt
ą
$ ack -i Ą test.txt
Ą

Calling it with ack -i '\p{Uppercase_Letter}' seems to match every line, but the output contains a lot of corrupted characters and I do not even know if copy&pasting here makes any sense. Similary for ack -i '\p{L}'.

I am using the latest version of ack:

ack --version
ack 2.12
Running under Perl 5.10.1 at /usr/bin/perl

qbolec avatar Sep 01 '14 15:09 qbolec

I can reproduce this with ack 2.12 and Perl 5.20, too.

I think the match highlighting does not highlight all of the multibyte character and hence breaks it into two non-valid UTF-8 characters.

xtaran avatar Sep 01 '14 15:09 xtaran

Unicode isn't really supported in any way (yet); here's a page on plans for 2.1, for which we've been considering Unicode support: https://github.com/petdance/ack2/wiki/Plans-for-2.1

hoelzro avatar Sep 01 '14 18:09 hoelzro

Isn't 2.1 < 2.12?

xtaran avatar Sep 01 '14 19:09 xtaran

Not supporting Unicode might be a good thing, if a proper warning is displayed to the user. In my particular scenario I've missed several occurrences of particular string in our codebase just because they were in different case, without the ack complaining that it does not understand my query. I think it should emit some warning to STDERR like "you have some non-ascii characters in the pattern, results might be wrong".

qbolec avatar Sep 01 '14 19:09 qbolec

@xtaran Hah, touché... Maybe we should start calling the next generation stuff 3.0?

hoelzro avatar Sep 01 '14 21:09 hoelzro

@qbolec Thanks for the suggestion; maybe we can roll that into 2.14.

hoelzro avatar Sep 01 '14 21:09 hoelzro

Alas the workaround i suggested on beyondgrep/ack2#565 doesn't apply here. ack '(?ui)ą' test gets jut a single result too (perl 5.18.2, US EN locale). However there's a slim chance that will work in your PL locale, worth a try.

n1vux avatar Jul 30 '15 20:07 n1vux

Setting up PERL_UNICODE made this work for me:

$ PERL_UNICODE=SAD ack '(?ui)ą' test.txt
ą
Ą

hoelzro avatar Jul 30 '15 20:07 hoelzro

Aha! Awesome. With that, don't even need the (?u)

PERL_UNICODE=SAD ack -i 'ą' test
ą
Ą

(Edited to XREF beyondgrep/ack3#258 Unicode Support where make test results with PERL_UNICODE=SAD are reported.)

n1vux avatar Jul 30 '15 20:07 n1vux