framework icon indicating copy to clipboard operation
framework copied to clipboard

don't clobber an existing file when building the site

Open Fil opened this issue 9 months ago • 2 comments

closes #1367

Analysis of the issue:

A file is created with the name given in its reference, even if this doesn't match the actual file name (for example because we use a case-insensitive OS, or symlinks).

For example, if we have a reference in page.md to <img src=IMAGE.jpg> and a symlink from image.jpg to an actual file on disk called horse.jpg, we will create _file/IMAGE.xxxx.jpg, and reference this correctly in page.html; the build will not carry any information about the original "true" filename.

However, if several references to the same file use inconsistent names that end up being the same file on disk (e.g. _file/IMAGE.jpg and _file/image.jpg on a case-insensitive system), then we might have an issue when serving the site — if the server itself is case sensitive it will only be able to serve one of the references, and will 404 on the other(s).

Approach taken in this PR:

Thankfully, the build process is already careful to write each file once. We now enforce this: before writing a file to a destination in the output root, we check that it is not clobbering an existing file. If that is the case, then it means that the destination has not been deduplicated properly, and on a case insensitive OS this is most probably due to inconsistent casing (or maybe inconsistent encoding of unicode characters) across the site.

Since this can might problems when hosting, as described above, we now break the build. We report a "File name conflict over (found path)" with no assumption made about the nature of the error. Hopefully, if it's a user error, the cause will be visible by comparing the paths shown in the log.

logs

I have tested this with IMAGE.jpg and image.jpg, and also ÉTÉ.jpg vs été.jpg. Since it's enforced in the effects’ copyFile and writeFile methods, it's not limited to “files”—for instance, it also breaks if you are using inconsistent naming in the style front matter option: STYLE.css vs. style.css.

Notes:

  • we can only detect this problem when running build—preview doesn't know about all the files, and even if it did, I think it can't know about what the filesystem considers "the same path" without writing to it.
  • the “inverse problem” (building on a case sensitive system, then serving from a case insensitive system) is not considered here; but [in theory] if two files references correspond to the same resource (as guaranteed by the content-hash), it doesn't matter what copy we use. I haven't tested this.
  • the use of realpath to get the "true" filename is now limited to the output directory that we just created—this ensures that there are no issues with symlinks. (Also it's only used to report the error to the user, not to detect the situation.)

It's very difficult to create unit tests that will run on a case sensitive ubuntu; should I create a test to make sure that we fail on darwin (local testing) and win32 (CI testing)?

previous branches:

supersedes #1369 supersedes #1373

Fil avatar May 21 '24 10:05 Fil