processwire-requests
processwire-requests copied to clipboard
Retain Non-ASCII chracters when uploading files
Short description of the enhancement
Make file name with NON ASCII characters possible. Short description goes here.
Optional: Steps that explain the enhancement
- Upload a file with non ascii characters
- The file name preserved as it is.
Current vs. suggested behavior
Current: All non ascii characters are stripped. Suggested: Preserve all non ascii characters
Why would the enhancement be useful to users?
For Asian users we use non ascii characters for file name. It is good to not need rename file name before upload.
Optional: Screenshots/Links that demonstrate the enhancement
If this feature is added, it might be worth to add checks for Windows systems. On Windows, PHP versions < 7.1 do not support characters outside of the active locale. Starting with 7.1, everything should be fine.
+1 for this. Very important for international community))
Current behaviour has been very problematic for some (many) of our use cases. +1 from here too.
As a matter of fact I've got one project on my desk right now that needs uploaded files to remain as-is, since they are actually consumed by another tool that requires specific formatting. Currently I'm at a loss about how to implement this without "reinventing the wheel", i.e. creating my own file field.
I am desperately need this. If someone can make this, can we raise some fund for it??
I have a working solution here or let say a proof of concept. Changes to the core files are needed.
Oh. This is good news. May be you can give us the git repository that we can download and try??
Gideon
I need to do some more testing. Unfortunately, I can't create a module since a lot of methods are not hookable.
@matjazpotocnik Seems to be a perfect case for a PR.
I won't make PR as it it would not be accepted by Ryan and I understand why. You are looking for troubles if you want to support non-ascii in uploads. It's not the problem making the non-ascii filenames to get uploaded and displayed in the file list, but how would that file be stored and represented on the filesystem. I would rather leave the core files intact and make a module, but then again, some methods in the core would need to be hookable and you see how the number of feature-requests are growing here... As BitPoet said, PHP 7.1 supports UTF-8 filenames on Windows disregarding the OEM codepage, will see what that brings to the picture.
Maybe just make a PR to make needed methods hookable?
Hmm, will have to sleep over (again) and maybe go into another direction that wouldn't require so much hooking.
It's not the problem making the non-ascii filenames to get uploaded and displayed in the file list, but how would that file be stored and represented on the filesystem.
Somewhat curious: what kind of problems would this cause? Personally I'd suggest making this possible at the core level, unless there's a very strong reason not to.
Make it configurable option for all that I care, but it's such a common need for non-English folks that IMHO it shouldn't be left out.
So far the only problems I can think of seem to be a) some potential for confusion regarding case (in)sensitive file handling by the OS, and b) the general idea that by filtering input extra carefully you can avoid potential issues on the output phase.
Somewhat curious: what kind of problems would this cause?
How would you like to see the file "test_漢字汉字.txt" on the file system? As "test_漢字汉字.txt" or "test_漢ĺ—汉ĺ.txt"? The first version is created on windows using wfio, the second version is what windows do by itself. If PHP 7.1 would solve this, than we are on good path.
Personally I'd suggest making this possible at the core level, unless there's a very strong reason not to.
I agree, but you have to address Ryan for this and that's what we are doing here :-)
The first version is created on windows using wfio, the second version is what windows do by itself.
I wouldn't see weird crap like that, because I don't use Windows ;)
Jokes aside, I was admittedly a bit worried that this might be a Windows-specific issue, and turns out it is. One option would be making this a configurable option and disabled by default, with proper warnings about Windows being a major jerk in this regard. That's what I'd do, anyway.
I agree, but you have to address Ryan for this and that's what we are doing here :-)
Definitely. Was just commenting what you said above, i.e. "I would rather leave the core files intact and make a module". I wouldn't :)
"...configurable option and disabled by default, with proper warnings about" any sort of incompatibility issues that might emerge. I support this :) Core support with some "shortcomings" regarding server compatibility issues is a lot better than having nothing. If ProcessWire can support most web servers out there, then that is a pretty solid start.
One option would be making this a configurable option and disabled by default, with proper warnings about
Would this configurable option be part of input field or generic option in config.php?
about Windows being a major jerk in this regard.
I'm not linux/mac user so I can't make comments on this, but from my very limited testing, linux is not better in this regard. I suppose it has to be configured somehow to support filenames in UTF-8 (locale?)?
I'm not linux/mac user so I can't make comments on this, but from my very limited testing, linux is not better in this regard. I suppose it has to be configured somehow to support filenames in UTF-8 (locale?)?
UTF-8 filenames are supported by all standard file systems on *nix OSes. Problems usually only arise when the shell (command line) is configured to use a non-utf8 locale or old non-utf8 applications are invoked. All halfway current versions of Apache (and, importantly, also mod_rewrite) support utf8.
A check for the combination of Windows + PHP < 7.1 together with a big red warning should IMHO be a sensible approach.
Though, to keep things simple at first, just making the necessary methods hookable through a PR and putting things into a module might still be the quicker way and let Ryan sleep easier. Once there has been some successful production testing, things could be moved into the core.
UTF-8 filenames are supported by all standard file systems on *nix OSes. Problems usually only arise when the shell (command line) is configured to use a non-utf8 locale or old non-utf8 applications are invoked. All halfway current versions of Apache (and, importantly, also mod_rewrite) support utf8.
I know UTF-8 filenames are supported on *nix system, but from testing (thx tpr) I conducted, filenames are not stored in UTF-8, my guess is that you have to convince Apache+php that you would like to work with UTF-8 encoding. How you do that if you are on shared hosting and don't have access to shell to setup locale (if this is what you are talking about)? My simple test was with
file_put_contents ("Árvíztűrő tükörfúrógép.txt", "data");
And I see the file:
ĂrvĂztűrĹ‘ tĂĽkörfĂşrĂłgĂ©p.txt
The characters in Apache-generated file listings should be shown correctly if you set
AddDefaultCharset UTF-8
in http.conf or, in the .htaccess in the directory with the files,
IndexOptions +Charset=UTF-8
as document in the mod_autoindex docs.
A check for the combination of Windows + PHP < 7.1 together with a big red warning should IMHO be a sensible approach.
Technically yes, but unexpected things can still happen if the site is moved to another server etc. This would be fine as an addition, as long as there's a clear warning that's always visible :)
My simple test was with ... And I see the file ...
So far I've been unable to reproduce this, seems to work just fine on a pretty out-of-the-box Ubuntu installation at least. Are you seeing strange characters in the actual filename on the disk (via shell) or in a file listing, i.e. in a browser? If it's in a file listing, could you check the file name on the disk just to make sure? :)
I finally got hands on linux box with root shell access. I got Ubuntu 16.10 with Apache 2 and PHP 7.0. I had to tweak the configuration of apache+php to make it work, but it looks like it's working.
On my previous tests on linux, the server was not properly configured for UTF-8. One server was creating files in ISO-8859-1 encoding, the other in ISO-8859-2. While windows stores the file in UTF-16 encoding internally, it performs conversion to the configured locale, in my case Windows-1250. Uploads are working on windows too, (on IIS 8.5) PHP 7.1 is manadtory! Attached are two recordings as proof of concept.
Changes to the core files are minimal, so I think there is no need for the module. I didn't make a PR as I think Ryan might go his own route (if at all), so I have rather created a zip file with changed files (PW3.0.42) so if anyone is going to try this, just replace the core files, there is readme.txt with instructions and what is changed.
https://www.dropbox.com/s/h1by4bm8j49jo7o/Upload%20demo%20windows.gif https://www.dropbox.com/s/cuu1tg7ie83li26/Upload%20demo%20linux.gif https://www.dropbox.com/s/dduaqkd6r68m8gn/Upload%20demo.zip
Looks good. I will test it and report back.
Hi @matjazpotocnik ,
Work nicely. How about make a PR and see if @ryancramerdesign would like to make it to the core??
Gideon
There are a lot of PRs already in the queue and just making another one won't help. Ryan will decide how, when and if this will find a way to the core.
Loosely related topic on the support forum: https://processwire.com/talk/topic/18354-no-lowercase-unzipped-files/. I'm still hoping that we can one day instruct ProcessWire to just keep filenames as-is. There are legitimate use cases for that.
Ping @ryancramerdesign.
Alpha proof-of-concept module for anyone interested in exploring the idea: https://github.com/Toutouwai/FieldtypeFileUnrenamed/
@Toutouwai I install the module and it doesn't seem to have any effect. Do I missed anything??
@gideonso, maybe you didn't create a new "Files Unrenamed" field? If that's not it then sorry, I don't know. That module is just a proof-of-concept demonstration - it's not a released module that I'm providing support for I'm afraid.
@Toutouwai , it is OK. Just wrote to see if you have any idea. Let's wait for the official solution if it comes one day.