media-manager-improvement
media-manager-improvement copied to clipboard
Normalize filenames - proposal
In issue #245 I argue that the characters allowed for filenames should be kept the same as with the old MM. This though do not imply that characters should be removed. Test Übergrößenträger.jpg as filename in the old MM. I propose:
JFilterOutput::stringURLSafe($string, $language = '')
This method processes a string and replaces all accented UTF-8 characters by unaccented
ASCII-7 "equivalents", whitespaces are replaced by hyphens and the string is lowercase.
- Lowercase file names. I use the stringURLSafe method that results in allowed characters. Maybe this method has to be duplicated to a stringFileSafe method? The URL characters have nothing to do with the filesystem and could be changed in near future. As for the alias field in the DB tables a counter has to be added if filename already exists. (myfile-3.txt)
EDIT: There is already a filter method stringUrlUnicodeSlug($string) for unicode slugs. - Lowercase filename extensions. jpg, png, gif, bmp, txt, mpg a s o.
- Replace jpeg with jpg and any other filetype with more than one extension. If not lowercase you get a LOT of different extensions. How do you like mylogo.pnG? I can use that on windows, can't upload with MM --> - Error This file type is not supported. After manual upload MM handels it.
Joomla can leave it to the users but why not help them with some normalizing.
I have made a relevant PR here https://github.com/joomla/joomla-cms/pull/12049
That does language transliteration for filenames Also added a method for making filename unique -- in a given folder
About transliteration, it can be used, until support for both UTF-8 / UTF-16 is solidly implemented,
@ggppdk
Great! I'll test it. I know that it works for me as I've used a method like that since Joomla 1.5.
I saw that you allow upper case letters. Personally I hate camel case file names. When I then see a JPEG at the end, then I go offside.
At the end of the day the file information should be stored in a DB table. Long name, description, alias, alt text, caption, credits and copyright, dates and a lot more as needed.
Well Joomla always had problems with recommendations and best practices.
I stumbled over a web page that very short describes why my normalizing is a good idea and the rules that should be followed. i.e I have some new friends ;) Michigan Technological University
I know that this is changing rapidly on the "top" level but Joomla has to be conservative.
There is a solution with joomla/joomla-cms#16878 that should be possible to add to 4.0.
It got worse with the new media manager. You remove any dashes? There is 2 preg_replace involved with a JStringPunycode::toPunycode in between?
Can you create a pr how do you think it should be done. If you have any question please let us know. Thanks.
@laoneo There is an open PR probably only targeting 3.7. This was voted down because it did not solve the problem for fellow PHP5.4 users in 3.7. It is though B/C - it does not break anything but makes it possible for greeks and chinese to use the new MM . I have no idea how it should be implemented in 4.0. I asked for a re-evaluation for 4.0 in that PR.
I use that code myself using my own JFile::makeSafe($file) override. Well, I added some more code to get file names as I want them. As I use a components item-titels for image names, it has to handle (translate) UTF-8 encoded texts. That could be an idea for a future db-table solution in MM. I decided to use readable file names even if random numbers was in my mind for a while. I decided to keep that "link", should the db-table get "killed" somehow.
If the code gets into a Joomla library, I can try to create a PR-proposal how to use it in MM.
Can you link your pr with this issue?
The link is included above in my comment 4 days ago!
Saw that one, but I got the impression you are talking about a different issue which deals with special characters like underscore and not latin characters. But then we can close this issue in favor of https://github.com/joomla/joomla-cms/pull/16878 or not?
This issue is about characters used in file names in general. My last comment was about dashes that you now remove from the file names and the use of punycode (München = Mnchen-3ya) after removing any unicode character. This has to be fixed in MM after PR 16878 has bin commited into 4.0 so I suppose it remains an issue for MM. If you have a todo list somewhere else you can close thiss issue if you like.
I'm strongly opposed to J! manipulating filenames in any way from the import source. People name their files for a reason INCLUDING "-". It should be totally irrelevant to us what the user filename is as long as it adheres to *nix file handling. I suppose if you want to make it a switch ( like html verification ) that would be ok but otherwise deciding a users filename is wrong simply because we don't like the syntax is inherently wrong!
It is not about that we change the filename because we don't like it, what we did, was to copy the logic from version 3. There must be a reason why it is that restrict. Personally I have no preference about the filename, but there is a history behind where we are now and that should be considered into our decision where to go. If you guys feel strong about a proper change, make a pr and we will discuss it with the JSST.
Joomla 3 does not rename files with a dash in it (not tested the new mm)
data:image/s3,"s3://crabby-images/26152/261524391d25373d40678a7345360c27f75d6fb0" alt="screenshotr07-13-56"
We took the logic from the JSON upload functionality of version 3. But the ordinary version 3 media manager upload feature has a less strict upload logic. I have no problem to revert it, but before I want to know why this happened. History says
- https://github.com/joomla/joomla-cms/commit/9f36b1f58bea851a63e1fbcdca2fb2ca1f271792
- https://github.com/joomla/joomla-cms/pull/9608
So I guess we should be fine here to have the less strict logic.
Can you guys please test #448 which should fix the issue. Keep in mind you need to be up to date with the dev branch as the changes of #446 are needed.
A short resume on filenames.
Use of blanks in filenames: If you allow blanks in filenames be aware of the following, especially if you write your own JS scripts.
- Blanks are not valid in URIs and HTTP-Requests. In some cases, probably not all, your browser and your browsers Javascript engine urlencodes the strings in question. i.e. the blank is replaced with %20 "behind the scene". http://my.server.com/images/my file.jpg becomes http://my.server.com/images/my%20file.jpg
By the way - my%20file.jpg is a valid filename but never use % in your filenames. You have to encode % manually to %25.
-
PHP and database table: $files = JFolder::files($path,...); If $files is not utf8mb4 encoded, you're in trouble if there are "extended" characters included. e.g. Über-größen_ länge.jpg => ends up in database as: ?ber-gr??en_ l?nge.jpg on a Windows system.
-
Javascript (Tested on a Windows system in new MM a.o. with deactivated name check!) Rename a file to Über-größen_ länge.jpg => ends up in filesystem as: Ãœbergrößen länge.jpg
Knowing the problems above I don't think Joomla in near future can come up with a solution where utf8 encoded filenames are possible. Therefore please get transliterator_transliterate() (Joomla PR 16878) into J 4.0!
- Lower case filenames and 1 filetype Again Windows OS is the main reason myFile.jpg myfile.jpg myfile.JPG is the same file on Windows, on other OS there are 3 different files. myfile.jpeg => to myfile.jpg