git-lfs icon indicating copy to clipboard operation
git-lfs copied to clipboard

Improve support for case-insensitive file systems / OSes

Open mojca opened this issue 5 years ago • 2 comments

Describe the bug

Suppose that a git repository contains file1.png and file2.PNG, with the latter only existing in the earlier commits (it is no longer present in the latest commit in the master branch).

Say that the developer uses Linux to run a straightforward command to convert png files to large files as documented in the Tutorial and elsewhere:

git lfs migrate import --include="*.png"

The new repository looks reasonable, with no evident problems, however only file1.png gets converted into a large file, while file2.PNG remains in the repository as a regular file.

As soon as someone tries to check out an older commit / branch / tag, run a bisection etc. on Windows, an unpleasant surprise is waiting:

Encountered 1 file(s) that should have been pointers, but weren't:
    file2.PNG

and the only way out is to force checkout of another commit.

According to my understand of #2858 this is considered to be a broken repository on Windows.

Some suggestions to fix the problem include creating a commit to fix the behaviour, but the latest commit is already fine. The problem happens when someone wants to do something with older branches / tags, and fixing the latest commit doesn't help at all.

The "best" alternative is to fix the full history, but that comes with tons of headaches related to everyone needing to do a clean checkout, loosing links in tickets, pull requests, some commits like revert commit abcde123456 no longer pointing to a valid sha reference etc.

A proper solution would have been to use

git lfs migrate import --include="*.[pP][nN][gG]"

from the start, but that is hardly documented everywhere, and potential problems are nearly impossible to spot until it's too late.

I didn't really test it, but I assume the opposite problem would happen when a user on Windows adds file3.PNG to the repository which gets stored as large file, and another user on Linux checks this out.

It would be great if:

  • there was a way to specify a case-insensitive pattern in a slightly less clumsy way than *.[pP][nN][gG]
  • the conversion git lfs migrate ... would at least warn the developer in case there are files in repository that match the pattern in case-insensitive way, but not in case-sensitive way, so that future users on Windows or macOS don't end up suffering from the issue
  • a similar warning is thrown when users try to add such files to repository later

To Reproduce

See above.

Expected behavior

Either of the following options:

  • git should never be complaining about Encountered 1 file(s) that should have been pointers, but weren't on case-insensitive systems
  • all patterns could be treated as case insensitive by default, potentially displaying a warning when something.PNG matches *.png
  • treat patterns as case-sensitive, but at least display warnings when either migrating an existing repo or when adding a new file that doesn't match the pattern in case-sensitive way, but matches in case-insensitive one, so that users are at least aware of potential consequences

mojca avatar Apr 27 '20 13:04 mojca

Hey,

The form of the pattern used here is specified by Git, not Git LFS. So on systems where core.ignorecase is set, they're case insensitive and on systems where it's not, they're case sensitive. As you've pointed out, this leads to unfortunate situations, but it's the way Git works.

Since Git LFS can't control where it gets invoked, the best we can do in this case is the last option. I think it should be relatively cheap to create a case-sensitive file path filter in the filter process and complain if the pattern doesn't match in a case-sensitive way.

I don't think we want to always use a case-insensitive filter, since we just had someone the other day who wanted to use case distinction as a distinguishing characteristic between LFS and non-LFS files. In addition, what constitutes a case-insensitive match outside of the ASCII range is a colossal nightmare and differs based on the OS, and on WIndows, the version of the OS that the file system was formatted under. Trying to get that right in every file name would be problematic.

I'll leave this open to track some improved heuristics here.

bk2204 avatar Apr 27 '20 14:04 bk2204

The form of the pattern used here is specified by Git, not Git LFS. So on systems where core.ignorecase is set, they're case insensitive and on systems where it's not, they're case sensitive.

But it appears as though it's inconsistent on a case-insensitive OS such as Windows — sometimes the pattern is treated as case-sensitive, and sometimes it's treated as case-insensitive. So internally git/git-lfs is contradicting itself somehow. That's why we get the weird error "Encountered N file(s) that should have been pointers, but weren't".

cmcqueen avatar Apr 04 '25 07:04 cmcqueen