linguist
linguist copied to clipboard
Font files are classified as Scala or HolyC
Font source files included in repositories are being misidentified as Objective-J/HolyC/Scala.
URL of the affected repository:
https://github.com/ValveSoftware/Proton/tree/proton_5.0 https://github.com/adobe-fonts/source-han-mono
Expected language:
none
Detected language:
Scala/HolyC/SuperCollider/etc
As you've already found, this is because of the .sc extension which is only associated with those three languages and no font "languages" or otherwise explicitly ignored. As a result, things fall through to the heuristic and then on to the classifier, but given Linguist doesn't know anything about the font "language" you're expecting it to find, it'll never find it.
As Linguist relies upon community contributions to address such things, we'd welcome a PR that either adds the language or ignores the files.
Oh, this applies to the other extensions too.
This issue has been automatically marked as stale because it has not had activity in a long time. If this issue is still relevant and should remain open, please reply with a short explanation (e.g. "I have checked the code and this issue is still relevant because ___."). Thank you for your contributions.
A month is a long time? Yes of course it's still an issue. Come on, bot. Am I going to have to do this every month?
A month is a long time? Yes of course it's still an issue. Come on, bot. Am I going to have to do this every month?
Yes, or you can submit a PR to add support 😉, after all Linguist relies almost exclusively on community contributions.
This issue has been automatically marked as stale because it has not had activity in a long time. If this issue is still relevant and should remain open, please reply with a short explanation (e.g. "I have checked the code and this issue is still relevant because ___."). Thank you for your contributions.
https://github.com/github/linguist/issues/4870#issuecomment-650159950
This issue has been automatically marked as stale because it has not had activity in a long time. If this issue is still relevant and should remain open, please reply with a short explanation (e.g. "I have checked the code and this issue is still relevant because ___."). Thank you for your contributions.
Asdf
@aeikum Could you enlighten us on what .SC files are? I googled SC font file" but it (naturally) brings up results for small-cap` variants of actual typefaces.
I believe they are variants of the font files for different languages. So they're different file types, despite the same final extension. See here: https://github.com/adobe-fonts/source-han-mono/tree/master/Heavy/OTC
J - Japanese, K - Korean, SC - Simplified Chinese, TC - Traditional Chinese. Not sure about HC.
SC - Simplified Chinese, TC - Traditional Chinese. Not sure about HC
Hong Kong and Taiwan have different variations of traditional Chinese, so TC and HC probably stand for "Taiwan Chinese" and "Hong Kong Chinese", respectively.
Anyway. From what I see in adobe-han-mono/Heavy/OTC, the .HC files are a mix of PostScript (Type 1 fonts, essentially specialised PostScript programs), OpenType feature definitions, and some ad hoc-looking format for CID font metadata. Since most of these files are too large to be indexed or displayed on GitHub, it's gonna be difficult to accurately determine how many repositories use .HC and .TC as file extensions (see CONTRIBUTING.md if you're unsure why that's relevant).
@aBARICHELLO Aside from the two repositories you linked to, how many others have you encountered where this is obviously an issue?
Note that there was quite some discussion about font files: https://github.com/github/linguist/issues/2516
That issue's from early/mid 2015. Barely any of it is relevant anymore — several missing font formats were added by yours truly, and being the resident font-nerd, I'm always eager to add support for a font format. 😉
@aBARICHELLO Aside from the two repositories you linked to, how many others have you encountered where this is obviously an issue?
Advanced search returned hundreds of results for .SC files being identified as Scala. .HC is harder to find.
Well, if you or anybody else is interested, here's an unsorted stash of .sc files harvested from search results. They don't appear to have much in common, however... 😕
This issue has been automatically marked as stale because it has not had activity in a long time. If this issue is still relevant and should remain open, please reply with a short explanation (e.g. "I have checked the code and this issue is still relevant because ___."). Thank you for your contributions.
Yup