The ContentType implementation of IContentType.isAssociatedWith is case insensitive and I find no case sensitive alternative
The ContentType implementation of IContentType.isAssociatedWith is case insensitive which means that if your content type contains .c and the presented file is a .C file ContentType.isAssociatedWith will return true.
There are 2 locations where ContentType.isAssociatedWith uses equalsIgnoreCase https://github.com/eclipse-platform/eclipse.platform/blob/f385650ab42fb7275ab164643fe2d1db5040750e/runtime/bundles/org.eclipse.core.contenttype/src/org/eclipse/core/internal/content/FileSpec.java#L53 and https://github.com/eclipse-platform/eclipse.platform/blob/f385650ab42fb7275ab164643fe2d1db5040750e/runtime/bundles/org.eclipse.core.contenttype/src/org/eclipse/core/internal/content/ContentType.java#L385
Unfortunately for me .c files are associated with c compilers and .C files are associated with C++ compilers which means CDT can not use ContentType.isAssociatedWith and chose to implemented a private method to provide the functionality. https://github.com/eclipse-cdt/cdt/blob/7c8bb9f00ef0d5e3ff493bc0c944519a3e476da2/core/org.eclipse.cdt.core/src/org/eclipse/cdt/internal/core/CContentTypes.java#L130
As I need a case sensitive implementation of IContentType.isAssociatedWith I have the option to make the private method of CDT public or have a solution in (I)ContentType. I'd prefer the second.
Note 1) I did not find any documentation that says the comparison is case insensitive. Note 2) I think file name comparison is case insensitive as well. Note 3) I can create a PR but I need some guidance on what the change should look like.
Why not use .cpp which is pretty standard and works on case insensitive file systems like on Windows?
Because it is not my call. I write a CDT extension not a cpp project.
Certainly you can't just make the comparison case-sensitive without breaking everyone relying on that not being the case-sensitive. Given that Window's file system is case-insensitive, relying on case seems like a poor design strategy in the first place. (You have to ask yourself, why has this not been needed/implementation for the past 18 years?)
At best I could image specifying an additional option for whether to be case-sensitive or not, with not being the default...
Rely on some characters at the end of a name to get the file type is already bad but distinguish them by their case is really bad I can't imagine why one would choose that by gcc explicitly list that case with special explanation:
https://gcc.gnu.org/onlinedocs/gcc-3.3/gcc/Overall-Options.html
file.C C++ source code which must be preprocessed. Note that in .cxx, the last two letters must both be literally x. Likewise, .C refers to a literal capital C.
I think the only solution would be to add an option if a filetype should be handled case sensitive or not.
Also note that Windows is not unable to store the case, it just by default treat them the same file so you can't create two files that differ in case but of course you can create:
- plain.c
- cplussplus.C
in the same directory without a problem!
You have to ask yourself, why has this not been needed/implementation for the past 18 years?
Seems someone made a pull request 19 years ago https://github.com/eclipse-platform/eclipse.platform/commit/8cee0a021f8eb1a3834e802e6e4e9278933117e7
As I said CDT made a workaround probably more than 10 years ago. I guess they tried to have this changed to.
Given that Window's file system is case-insensitive, relying on case seems like a poor design strategy in the first place.
Windows file system is not case insensitive. If you save a file named AaBB.cC and you do dir you will get AaBB.cC. In the windows file system in a folder all files must be different case insensitive.
Solutions I see to this problem
- Document that IContentType.isAssociatedWith is case in sensitve
- add a method IContentType.isAssociatedWithTrict which is case sensitive
- Add a flag to IContentType itself to specify it as case sensitive
On Windows, if I do this in gitbash:
$touch a.c a.C && ls a*
it only shows:
a.c
That's because the file system is case insensitive:
Yes, you can have a.c or a.C (case preserving) but you cannot have both a.c and a.C in the file system because they are treated as having the same name.
I haven't looked closely, but can't such a default (to do 3.) be "hidden" in the org.eclipse.core.internal.content.FileSpec implementation?
@jonahgraham
Do you have an opinion about this?
Microsoft windows is case preserving but not case sensitive seems a correct description.
On Windows, if I do this in gitbash:
$touch a.c a.C && ls a*
it only shows:
a.c
exactly but if you do rm a.* $touch a.C a.c && ls a*
it will show:
a.C
TL;DR
AFAICT the bug here is that https://github.com/eclipse-platform/eclipse.platform/commit/8cee0a021f8eb1a3834e802e6e4e9278933117e7 should have added to content type extension point documentation that the patterns are case insensitive.
I would be happy for a pull request that added that documentation.
It would be reasonable to add a feature request to allow defining case-sensitive extensions to content type definitions, but I am not sure who (else) needs it.
The rest of this is a trip down memory lane, but fundamentally feels off-topic the the issue, but on-topic to where the discussion has gone, hence I made it collapsible so that you don't all have to see it.
Many years ago (40+) a decision was made that uppercase extensions in C universe is treated as non-C files that need pre-processing (.C needed pre processing from the new C with classes language to c, uppercase S means preprocessing before being passed as lowercase s to the assembler, similar for fortran (F vs f)). IMHO that was a silly decision, but one that has been around forever. In practice uppercase C is rarely used as using case sensitivity to distinguish between names is silly and the alternatives of using cpp, cc, cxx, etc makes more sense.
This discussion/issue doesn't have to do with Windows really. It has to do with what extensions mean. In many contexts extensions are case insensitive, e.g. .jpg vs. JPG should be treated the same. I don't know for sure if this is what Rafael was doing nearly 20 years ago with content types in https://github.com/eclipse-platform/eclipse.platform/commit/8cee0a021f8eb1a3834e802e6e4e9278933117e7 - but it is an educated guess.
Note the issue that @jantje has raised doesn't affect CDT as he points out - CDT has had proprietary handling of content types since early days of CDT (~17 years ago .C vs .c support was added)
CDT has a more complete and complex type system than is available in Eclipse Platform, allowing users to add arbitrary file patterns as different content types, and allows different projects to have the same file name patterns for different content types (or vice-versa). (FWIW this feature of CDT is very much used in the field - ask if you want more details)
The above mentioned feature request would not be useful for CDT itself (although perhaps it would be for @jantje's extension) as an additional particularity of CDT's handling is that .h (lower case only) can sometimes map to C++ content type and sometimes to C content type. See this code and related methods for the details.
This ycombinator discussion reminded me that .CPP and .cpp are both understood by GCC as C++ files.
@jonahgraham Thanks for the trip down memory lane.