glob
glob copied to clipboard
Support non-UTF-8 filenames
This should resolve #23. The implementation works on both Linux and Windows for me (with the available tests). It's still very possible something broke though, so I'd appreciate it if another set of eyes could carefully check. I'm also very open to design changes if desired.
As part of the changes, I removed the methods matches_path() and matches_path_with() as they are now redundant, but I can add them back for the sake of backwards-compatibility.
I also have not modified the documentation, so there's probably some stuff that needs to change.
Could you please describe the high level solution to the problem that you implemented?
Alright, no problem. I tried to keep the logic used in the original code the same, so most all of the changes were around getting the same code to work without relying on String/&str.
Because the only characters we need to match against are the few special characters (like * and ?) and other characters that come from the input itself, I abstracted the pattern creation and matching functions to use platform-dependent iterators (EncodeWide on Windows and an iterator over a byte slice on other platforms). Additionally, anything that stored char now stores the platform-dependent type used by the iterators (u16 on Windows and u8 on other platforms), and the special characters are provided by a platform-dependent trait/struct (PlatformCharset/PatternCharset).
By doing so, we just check need to check whatever value is retrieved from the iterator against the special characters retrieved from the platform-dependent trait/struct. If the value is not a special character, we can just store it in PatternToken like we stored chars in the previous implementation.
Any feedback?
Seems force-pushing broke the diff, the branch now only has https://github.com/rust-lang/glob/pull/69/commits/eeb8a73cc1627944d4503699d36b5586462ad8fe.