linguist icon indicating copy to clipboard operation
linguist copied to clipboard

Font files are classified as Scala or HolyC

Open abarichello opened this issue 5 years ago • 18 comments

Font source files included in repositories are being misidentified as Objective-J/HolyC/Scala.

URL of the affected repository:

https://github.com/ValveSoftware/Proton/tree/proton_5.0 https://github.com/adobe-fonts/source-han-mono

Expected language:

none

Detected language:

Scala/HolyC/SuperCollider/etc

abarichello avatar May 26 '20 13:05 abarichello

As you've already found, this is because of the .sc extension which is only associated with those three languages and no font "languages" or otherwise explicitly ignored. As a result, things fall through to the heuristic and then on to the classifier, but given Linguist doesn't know anything about the font "language" you're expecting it to find, it'll never find it.

As Linguist relies upon community contributions to address such things, we'd welcome a PR that either adds the language or ignores the files.

lildude avatar May 26 '20 13:05 lildude

Oh, this applies to the other extensions too.

lildude avatar May 26 '20 13:05 lildude

This issue has been automatically marked as stale because it has not had activity in a long time. If this issue is still relevant and should remain open, please reply with a short explanation (e.g. "I have checked the code and this issue is still relevant because ___."). Thank you for your contributions.

stale[bot] avatar Jun 26 '20 04:06 stale[bot]

A month is a long time? Yes of course it's still an issue. Come on, bot. Am I going to have to do this every month?

aeikum avatar Jun 26 '20 12:06 aeikum

A month is a long time? Yes of course it's still an issue. Come on, bot. Am I going to have to do this every month?

Yes, or you can submit a PR to add support 😉, after all Linguist relies almost exclusively on community contributions.

lildude avatar Jun 26 '20 12:06 lildude

This issue has been automatically marked as stale because it has not had activity in a long time. If this issue is still relevant and should remain open, please reply with a short explanation (e.g. "I have checked the code and this issue is still relevant because ___."). Thank you for your contributions.

stale[bot] avatar Jul 26 '20 13:07 stale[bot]

https://github.com/github/linguist/issues/4870#issuecomment-650159950

aeikum avatar Jul 27 '20 13:07 aeikum

This issue has been automatically marked as stale because it has not had activity in a long time. If this issue is still relevant and should remain open, please reply with a short explanation (e.g. "I have checked the code and this issue is still relevant because ___."). Thank you for your contributions.

stale[bot] avatar Aug 29 '20 09:08 stale[bot]

Asdf

aeikum avatar Aug 31 '20 11:08 aeikum

@aeikum Could you enlighten us on what .SC files are? I googled SC font file" but it (naturally) brings up results for small-cap` variants of actual typefaces.

Alhadis avatar Aug 31 '20 11:08 Alhadis

I believe they are variants of the font files for different languages. So they're different file types, despite the same final extension. See here: https://github.com/adobe-fonts/source-han-mono/tree/master/Heavy/OTC

J - Japanese, K - Korean, SC - Simplified Chinese, TC - Traditional Chinese. Not sure about HC.

aeikum avatar Aug 31 '20 11:08 aeikum

SC - Simplified Chinese, TC - Traditional Chinese. Not sure about HC

Hong Kong and Taiwan have different variations of traditional Chinese, so TC and HC probably stand for "Taiwan Chinese" and "Hong Kong Chinese", respectively.

Anyway. From what I see in adobe-han-mono/Heavy/OTC, the .HC files are a mix of PostScript (Type 1 fonts, essentially specialised PostScript programs), OpenType feature definitions, and some ad hoc-looking format for CID font metadata. Since most of these files are too large to be indexed or displayed on GitHub, it's gonna be difficult to accurately determine how many repositories use .HC and .TC as file extensions (see CONTRIBUTING.md if you're unsure why that's relevant).

@aBARICHELLO Aside from the two repositories you linked to, how many others have you encountered where this is obviously an issue?

Alhadis avatar Aug 31 '20 12:08 Alhadis

Note that there was quite some discussion about font files: https://github.com/github/linguist/issues/2516

smola avatar Aug 31 '20 17:08 smola

That issue's from early/mid 2015. Barely any of it is relevant anymore — several missing font formats were added by yours truly, and being the resident font-nerd, I'm always eager to add support for a font format. 😉

Alhadis avatar Aug 31 '20 17:08 Alhadis

@aBARICHELLO Aside from the two repositories you linked to, how many others have you encountered where this is obviously an issue?

Advanced search returned hundreds of results for .SC files being identified as Scala. .HC is harder to find.

abarichello avatar Aug 31 '20 21:08 abarichello

Well, if you or anybody else is interested, here's an unsorted stash of .sc files harvested from search results. They don't appear to have much in common, however... 😕

Alhadis avatar Sep 16 '20 22:09 Alhadis

This issue has been automatically marked as stale because it has not had activity in a long time. If this issue is still relevant and should remain open, please reply with a short explanation (e.g. "I have checked the code and this issue is still relevant because ___."). Thank you for your contributions.

stale[bot] avatar Dec 25 '20 13:12 stale[bot]

Yup

aeikum avatar Dec 28 '20 12:12 aeikum