Biohazrd icon indicating copy to clipboard operation
Biohazrd copied to clipboard

Handle file system case-sensitivity in TranslatedLibraryBuidler

Open PathogenDavid opened this issue 3 years ago • 2 comments

TranslatedLibraryBuilder treats file paths as case-insensitive regardless of the case-sensitivity of the file system. This was done to avoid problems arising between a difference in casing between what is provided to TranslatedLibraryBuilder, the file system, and the #include directives in the C++ source. (We regularly compare filenames for the sake of resolving which Clang cursors correspond to which input files -- or to determine if the cursor is out-of-scope.)

We do not expect well-formed C++ code to involve incorrect casing or to have multiple files with the same casing as both practices are non-portable. (The former is a warning -Wnonportable-include-path in Clang.) As such, this is not really a high priority as it's only really problematic with poorly-written C++ libraries on Linux or unusual macOS/Windows systems.

The ideal solution would be to normalize paths to their actual casing once https://github.com/dotnet/runtime/issues/14321 is realized.

I am hesitant to conditionally use case-sensitive comparisons based on the OSPlatform because case-sensitivity is an attribute of the file system, not the OS. (ext4 even allows case-sensitivity as an attribute of a directory.)

PathogenDavid avatar Aug 21 '20 03:08 PathogenDavid

This is also somewhat of an issue with MoveLooseDeclarationsIntoTypesTransformation because it uses the file paths to determine which type the loose declaration is associated with. (Although if the wrong casing was used, the generator is non-portable anyway.)

PathogenDavid avatar Oct 03 '20 07:10 PathogenDavid

Here's the scenario that the case-insensitive path comparisons in Biohazrd is trying to solve. Consider the following library with two header files:

FileA.h

#pragma once
#include "fileb.h" // <-- Notice the wrong casing here

void FunctionA();

FIleB.h

#pragma once

void FunctionB();

Now consider the user of Biohazrd requests that it builds FileA.h and FileB.h, the generated index file will look like:

#include "FileA.h"
#include "FileB.h

The cursor tree we get back from Clang will look roughly like the following:

  • TranslationUnitDecl corresponding to <>TranslatedLibraryIndexFile.cpp (the Biohazrd index file)
    • FunctionDecl for FunctionB corresponding to fileb.h
    • FunctionDecl for FunctionA corresponding to FileA.h

Note that FunctionB showed up first despite being included second in the index file. This happened because FileA.h included FileB.h as fileb.h. The pre-processed version of the file as seen by Clang looks roughly like this:

#line 3 "fileb.h"
void FunctionB();
#line 4 "FileA.h"
void FunctionA();

When we encounter the FunctionB declaration, does it correspond to FileB.h? The answer depends on whether or not the file system was case-sensitive. Clang actually provides a method for getting the real path (which we do use), but that doesn't help us in certain weird situations. (Also we prefer the path provided to us by the user, which isn't necessarily the best either -- see #67)

Problems with using case-insensitive when file system is actually case-sensitive

  • Having multiple case-different-only files is not supported. (Arguably a very rare case even when a library is Linux-only and assumes Linux == case-sensitive.)
  • Out-of-scope cursors might get considered in-scope* (When an in-scope and out-of-scope file are only case-different.)

(*We actually probably have a bug here: On debug builds I think this triggers an assert but in release builds the out-of-scope file becomes in-scope if it is encountered first and the in-scope file implicitly becomes out-of-scope.)

Problems with using case-sensitive when the file system is actually case-insensitive

  • Poorly formed libraries won't work right. (Such as a Windows or macOS-only library which has #include directives with the wrong casing.)

Problems with assuming Linux is always case-sensitive

Poorly-formed libraries can't be processed on Linux when they should be able to.

Poorly formed libraries on CI

It is conceivable that you might want to use Biohazrd on a Linux CI server to process one of the poorly-formed Windows-only libraries mentioned previously. To enable this scenario, you might use EXT4's ability to mark a directory as case-insensitive, but this assumption breaks that.

Building on WSL or Samba or NTFS

It is conceivable you might want to use Biohazrd in the context of WSL, Samba, or a NTFS drive. In all three of these scenarios the file system is case-insensitive and assuming it is would break things for poorly-formed libraries.

PathogenDavid avatar Oct 03 '20 08:10 PathogenDavid