rg.el icon indicating copy to clipboard operation
rg.el copied to clipboard

Fix the issue where rg.el couldn't search Unicode characters.

Open chansey97 opened this issue 2 months ago • 1 comments

A known issue of rg.el is that Unicode search is not supported on windows.

Some related issues: https://github.com/dajva/rg.el/issues/101 https://github.com/dajva/rg.el/issues/117

The reason is that at the moment NTEmacs limits non-ASCII file arguments to the current codepage, see https://github.com/emacs-mirror/emacs/blob/58a7b99823c5c42161e9acf2abf6c22afd4da4cd/src/w32.c#L1648.

Running subprocesses in non-ASCII directories and with non-ASCII file arguments is limited to the current codepage (even though Emacs is perfectly capable of finding an executable program file in a directory whose name cannot be encoded in the current codepage). This is because the command-line arguments are encoded before they get to the w32-specific level, and the encoding is not known in advance (it doesn't have to be the current ANSI codepage), so w32proc.c functions cannot re-encode them in UTF-16. This should be fixed, but will also require changes in cmdproxy. The current limitation is not terribly bad anyway, since very few, if any, Windows console programs that are likely to be invoked by Emacs support UTF-16 encoded command lines.

For similar reasons, server.el and emacsclient are also limited to the current ANSI codepage for now.

Emacs itself can only handle command-line arguments encoded in the current codepage.

This patch provides a workaround: Instead of passing Unicode arguments to ripgrep via Emacs, it via a temp .bat script, which was generated whenever rg-build-command. This allows rg.el now to search the entire Unicode planes (including rare scripts and Emojis), rather than being restricted to a specific codepage.

P.s. This feature is disabled by default for keeping old behavior, and can be enabled it via set rg-w32-unicode = t.

chansey97 avatar May 09 '24 09:05 chansey97