metacatui icon indicating copy to clipboard operation
metacatui copied to clipboard

Allow all legal characters in filenames

Open robyngit opened this issue 1 year ago • 5 comments

Files uploaded via the editor are sometimes renamed. All characters in filenames that are not letters or digits are replaced with an underscore. This is too conservative - we should allow "-" in filenames as well.

robyngit avatar Nov 28 '23 19:11 robyngit

We should try to not rename if possible, and only rename if the characters are truly illegal. Its a problem, for example, if the filename is referenced in an R or python script, or elsewhere in the metadata text, and it doesn't match up. Maybe we are being too greedy in the characters we eliminate?

mbjones avatar Nov 28 '23 20:11 mbjones

@mbjones agree that should aim to minimize renaming and only replace characters that are actually illegal or problematic across different platforms. I propose we come up with a set of allowable characters. Here's what I've got so far, input appreciated!!

Characters to Allow: Universally legal

  1. Alphanumeric Characters A-Z, a-z, 0-9
  2. Hyphen -
  3. Underscore _
  4. Period .

Characters for Discussion: These characters can cause issues in some contexts

  1. Space
  2. Parentheses (), Square Brackets []
  3. Exclamation Mark !, At Symbol @, Number Sign #, Dollar Sign $, Percent Sign %
  4. Plus Sign +, Equal Sign =
  5. Comma ,
  6. Single Quotes '
  7. Other characters...?

robyngit avatar Nov 29 '23 17:11 robyngit

What about all other non-ascii characters, in the Unicode range?

mbjones avatar Nov 29 '23 17:11 mbjones

Hi @robyngit I just wanted to point out that ESS-DIVE recently received a use case related to this issue ticket. Recording here for tracking purposes.

This user requested several times that we allow hyphens and periods to be included in their file name. The file she wants to upload is named 00001-02715.00001-01803 however it gets renamed to 00001_02715.00001-01803. See quoted response below, listed in chronological order that we received them:

It is important that the dataset actually be named: 00001-02715.00001-01803 with no underscores because the filename must be in that format for WRF to read.

The 00001_02715.00001-01803 file (although the first underscore in this filename should have been a dash, I’m not sure why that transferred differently).

Thanks also for looking into the naming of the binary file. I tried several times to rename it manually, but I was unsuccessful. The filename must have dashes and not underscores on both sides of the dot or it will not work.

As an FYI, I conferred with one of our team members who regularly conducts metadata and file reviews, and he said he most often sees - and . characters used in file names. He didn't recall seeing other characters used. Additionally, he said this bug effects reporting format datasets using File Level Metadata (FLMD) files because data contributors would fill out the FLMD files with a file name that was different from the file name in our system.

Adding @vchendrix and a reference to the Jira ticket for tracking purposes: EDSUPPORT-2670.

mburrus avatar May 31 '24 00:05 mburrus

I agree. We should not be renaming files in any way. Any idea of why we do this? It's not needed on the backend.

mbjones avatar May 31 '24 13:05 mbjones