Gaufrette icon indicating copy to clipboard operation
Gaufrette copied to clipboard

Diacritics from stream with LocalAdapter

Open phcorp opened this issue 7 years ago • 10 comments

Q A
Bug report? yes
Gaufrette version 0.3.1
PHP version 7.1.5 and 7.1.6
System OSX Sierra and Ubuntu 16.04

I recently encountered a problem with diacritics when trying to access a file from a stream.

I setup uploads as a filesystem with a local adapter.

the following is working:

file_get_contents("gaufrette://uploads/foo.txt");

the following is not working:

file_get_contents("gaufrette://uploads/baré.txt");

NB: files exist on filesystem and sources files are encoded in UTF-8 by default.

phcorp avatar Jun 24 '17 11:06 phcorp

@phcorp Thank you for the bug report. But as said in the PR I'm not sure if that's a bug (due to your environment) or a request feature. Could you provide the output of locale command on your ubuntu system? Reading this php bug report, it looks like it could be due to charset issues.

akerouanton avatar Jun 26 '17 20:06 akerouanton

I'm pretty sure it's a bug related to your charset config: non-ascii chars works well on 3v4l.org, see here.

akerouanton avatar Jun 26 '17 20:06 akerouanton

Ubuntu

LANG=fr_FR.UTF-8
LANGUAGE=
LC_CTYPE="fr_FR.UTF-8"
LC_NUMERIC="fr_FR.UTF-8"
LC_TIME="fr_FR.UTF-8"
LC_COLLATE="fr_FR.UTF-8"
LC_MONETARY="fr_FR.UTF-8"
LC_MESSAGES="fr_FR.UTF-8"
LC_PAPER="fr_FR.UTF-8"
LC_NAME="fr_FR.UTF-8"
LC_ADDRESS="fr_FR.UTF-8"
LC_TELEPHONE="fr_FR.UTF-8"
LC_MEASUREMENT="fr_FR.UTF-8"
LC_IDENTIFICATION="fr_FR.UTF-8"
LC_ALL=

OSX

LANG="fr_FR.UTF-8"
LC_COLLATE="fr_FR.UTF-8"
LC_CTYPE="fr_FR.UTF-8"
LC_MESSAGES="fr_FR.UTF-8"
LC_MONETARY="fr_FR.UTF-8"
LC_NUMERIC="fr_FR.UTF-8"
LC_TIME="fr_FR.UTF-8"
LC_ALL=

Seems a problem related to system locale, nice catch! Yet if the PR fixes that behavior without BC break it may be merged as a new feature.

phcorp avatar Jun 26 '17 23:06 phcorp

Is this the output of locale after you modified your system charset? Do you still have the bug?

If that's only an issue related to system charset, I'm not sure we'll merge it. Anyway, I'll wait for other opinions on the PR.

akerouanton avatar Jun 26 '17 23:06 akerouanton

I didn't modify my system charset in any way. Tip to detect the issue: the parse_url function works fine on my system until using stream.

phcorp avatar Jun 27 '17 12:06 phcorp

Tip to detect the issue: the parse_url function works fine on my system until using stream.

Mmh, what do you mean? parse_url is used at two places in Gaufrette code: in the stream, and in the MogileFS adapter.

Do you have any code that could alter your locale? Please post the output of the following command executed on your Ubuntu server: $ php -r 'var_dump(setlocale(LC_ALL, "0"), parse_url("http://phpcorp/barré"));'

akerouanton avatar Jun 27 '17 20:06 akerouanton

I mean I don't encounter charset problems with parse_url outside of the stream. I don't have any known code that could alter my locale. here is the requested output from Ubuntu Server:

Command line code:1:
string(170) "LC_CTYPE=fr_FR.UTF-8;LC_NUMERIC=C;LC_TIME=C;LC_COLLATE=C;LC_MONETARY=C;LC_MESSAGES=C;LC_PAPER=C;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=C;LC_IDENTIFICATION=C"
Command line code:1:
array(3) {
  'scheme' =>
  string(4) "http"
  'host' =>
  string(7) "phpcorp"
  'path' =>
  string(7) "/barré"
}

and OSX:

Command line code:1:
string(21) "C/fr_FR.UTF-8/C/C/C/C"
Command line code:1:
array(3) {
  'scheme' =>
  string(4) "http"
  'host' =>
  string(7) "phpcorp"
  'path' =>
  string(7) "/barré"
}

phcorp avatar Jun 28 '17 01:06 phcorp

After testing again manually, I confirm UTF-8 URLs are correctly parsed by parse_url on my system. It is not the case in method createStream I don't know how.

phcorp avatar Jun 28 '17 02:06 phcorp

I found many stream wrappers having their own implementation of parse_url but I didn't find any relevant information more than charset issues about why they do that. https://searchcode.com/codesearch/view/95070302/ https://www.phpclasses.org/browse/file/20566.html https://github.com/WPsites/WPide/blob/master/git/src/TQ/Git/StreamWrapper/PathInformation.php https://github.com/EGroupware/egroupware/blob/master/api/src/Vfs.php

phcorp avatar Jun 28 '17 02:06 phcorp

I pushed a branch on my fork with the test case coming from #508 (see the diff here). It works properly on my computer, so the only thing left is your locale. Sorry but I don't think we should add a workaround for broken systems in Gaufrette, that's not what will make it better in the first place.

akerouanton avatar Sep 21 '17 13:09 akerouanton

Hello @phcorp , do you still have the problem ? Or can I close this Issue ? Thanks you for your feedback

Zusoy avatar May 12 '23 09:05 Zusoy

Hello I think the problem still exists, but I don't meet problems anymore since I applied a patch manually. We previously couldn't find a way to make it work everywhere so it was never merged. Feel free to close the issue.

phcorp avatar May 12 '23 09:05 phcorp