PHP_CodeSniffer icon indicating copy to clipboard operation
PHP_CodeSniffer copied to clipboard

Generic/LowercasedFilename: sniff doesn't handle non-ANSII characters properly

Open rodrigoprimo opened this issue 1 year ago • 1 comments

Describe the bug

While working on improving code coverage for the Generic.Files.LowercasedFilename sniff (#681), I noticed that it fails to properly handle file names that contain uppercase non-ANSII characters as it uses strtolower() to check if the filename is all lowercase. strtolower() ignores non-ANSII characters.

https://github.com/PHPCSStandards/PHP_CodeSniffer/blob/26ddb35f4684760b27ad48d3c420afb2c636cc1b/src/Standards/Generic/Sniffs/Files/LowercasedFilenameSniff.php#L51

Code sample

<?php

To reproduce

Steps to reproduce the behavior:

  1. Create a file called tÉst.php with the code sample above.
  2. Run phpcs tÉst.php --standard=Generic --sniffs=Generic.Files.LowercasedFilename
  3. No error message is displayed.

Expected behavior

PHPCS should display the following error message:

----------------------------------------------------------------------------------
FOUND 1 ERROR AFFECTING 1 LINE
----------------------------------------------------------------------------------
 1 | ERROR | Filename "tÉst.php" doesn't match the expected filename "tést.php"
----------------------------------------------------------------------------------

Versions (please complete the following information)

Operating System Ubuntu 24.04
PHP version 8.3
PHP_CodeSniffer version master
Standard Generic
Install type git clone

Please confirm

  • [x] I have searched the issue list and am not opening a duplicate issue.
  • [x] I have read the Contribution Guidelines and this is not a support question.
  • [x] I confirm that this bug is a bug in PHP_CodeSniffer and not in one of the external standards.
  • [x] I have verified the issue still exists in the master branch of PHP_CodeSniffer.

rodrigoprimo avatar Nov 13 '24 21:11 rodrigoprimo

@rodrigoprimo Thanks for finding and reporting this issue.

While this is an interesting issue from a technical perspective, I consider this issue a low priority issue unless and until end-users of PHPCS would report they are running into it.

I wonder how common it is to have non-ASCII characters in file names ? I also have a gut-feeling files like that may not always be portable cross-OS, but this would need to be researched and confirmed/debunked first. If my gut-feeling would turn out to be correct, I can imagine non-ASCII characters in file names might deserve their own sniff (to forbid this).

I also wonder how we could detect this reliably as, while the file contents has an encoding, I don't know how we could figure out the encoding for the file name. I imagine the encoding might be based on the OS ? File name vs encoding is a curiosity which I've never dug into, so I'd be very interested to hear from someone who has and who can shed more light on this.

jrfnl avatar Nov 24 '24 18:11 jrfnl