maestral icon indicating copy to clipboard operation
maestral copied to clipboard

Files containing some upper-case Cyrillic or Greek characters are not synced properly

Open samschott opened this issue 3 years ago • 0 comments

Describe the bug Files with upper-case Cyrillic or an upper case Σ in the name may not be synced correctly and can result erroneous conflicting copies. This is because Dropbox internally is not case-sentive and converts all paths to lower case. However, the conversion to lower case in sometimes performed "incorrectly" by Dropbox servers, following the behaviour of str.lower() from Python 2.5. Notably:

  1. Upper case Cyrillic characters may not be converted to lower case at all.
  2. An upper case Σ should be converted to a ς instead of σ when at the end of a word. However, Dropbox servers convert it to a σ.

To Reproduce Create a file with such an offending character in its file name and wait for it to upload. Then make changes to it locally. During upload sync, a conflicting copy will be created. This is because the file information is saved with its "incorrect" lower case name in our index, as provided by the Dropbox metadata for the created file. When detecting further changes to the local file, the sync daemon will check our index for the last revision but search for the "correct" lower case. It won't find any previous versions and upload the file flagged as new, resulting in Dropbox creating a conflicting copy.

Expected behaviour We should replicate the lower case function used on Dropbox servers locally by following the Python 2.5 unicode.lower() implementation. This is more easily said than done:

  1. unicode.lower() is implemented in C and any Python implementation will perform much worse. This can be problematic for long file paths.
  2. unicode.lower() uses the C function towlower() internally which behaves differently on some platforms.

System:

  • Maestral version: 1.4.4 / 1.4.5
  • Python version: 3.9

samschott avatar Jun 20 '21 14:06 samschott