linguist icon indicating copy to clipboard operation
linguist copied to clipboard

Add Source Engine formats (.vcd, .fgd, .vmf, .vmt, .qc)

Open moofemp opened this issue 3 years ago • 14 comments

Add five Source Engine formats which are children of Valve Data Format (.vdf): Valve Choreography Data (.vcd), Valve Hammer Definition File (.fgd), Valve Map Format (.vmf), Valve Material Type (.vmt), Valve Model Source (.qc)

Description

Relevant about Valve/Source Engine data types: https://developer.valvesoftware.com/wiki/Notepad%2B%2B_VDF_languages which links to grammars for Notepad++: https://github.com/ReverendV92/NotepadPP-VDF-Languages

Out of these formats: VCD files are usually made with FacePoser and VMF files are usually made with Hammer, though both are plaintext and can be written with a text editor FGD, VMT, and QC files are usually written with a text editor

I used the official Source Engine branding colour for Valve Map File and created my own colour scheme for the rest: https://moofemp.com/image/source-colour-scheme.png

This also includes addition of Verilog .vcd format to avoid conflict/misidentification with Valve Choreography .vcd.

Checklist:

  • [x] I am adding a new language.
    • [x] The extension of the new language is used in hundreds of repositories on GitHub.com.
      • Search results for each extension:
        • https://github.com/search?q=extension%3Avcd+path%3Ascenes%2F&type=Code&ref=advsearch&l=&l=
        • https://github.com/search?q=extension%3Afgd&type=Code
        • https://github.com/search?q=extension%3Avmf&type=Code
        • https://github.com/search?q=extension%3Avmt&type=Code
        • https://github.com/search?q=extension%3Aqc&type=Code
        • https://github.com/search?q=extension%3Avcd+verilog&type=Code
    • [x] I have included a real-world usage sample for all extensions added in this PR:
      • Sample source(s):
        • VMT samples from Source SDK 2013 content; FGD, VCD, VMF, and QC examples from maplabstemplate content
        • https://steamdb.info/app/243730/
        • https://github.com/ValveSoftware/source-sdk-2013
        • https://mega.nz/file/qstTSYCS#EHqMU9TUfCuXZJq8oBEEsTNoqb1o7qSQISOJaooZ1JQ
      • Sample license(s): SOURCE 1 SDK LICENSE https://github.com/ValveSoftware/source-sdk-2013/blob/master/LICENSE
      • Verilog VCD samples:
        • https://github.com/arvin2079/verilog-proj/blob/44c4d00d566a2f5e3fd26745f78b2c2e1d2f3b88/verilog%20implementations/alu_control/testbench/test.vcd
        • https://github.com/arvin2079/verilog-proj/blob/44c4d00d566a2f5e3fd26745f78b2c2e1d2f3b88/verilog%20implementations/control/testbench/test.vcd
        • License: GNU GPL v3 / MIT https://github.com/arvin2079/verilog-proj/blob/main/README.md
    • [ ] I have included a syntax highlighting grammar: https://github-lightshow.herokuapp.com/
    • [ ] I have updated the heuristics to distinguish my language from others using the same extension.

moofemp avatar Jun 08 '21 19:06 moofemp

A 114K line file is too much for a sample (mapbase_demo01)

Replaced with instance_tutorial_button from maplabstemplate content (325 lines) https://github.com/github/linguist/pull/5413/commits/18dafeb3573cbae40722afdd768691c6d972c4ee

moofemp avatar Jun 08 '21 19:06 moofemp

I feel like these should all just be added as new extensions to the VDF entry since they're not a different language. Though, Linguist does sometimes differentiate between different uses of the same language, but not sure if that's the best option here as there are quite a few.

It seems to me that's what the group field is for in languages.yml? I defined all of the Source Engine languages with group: Valve Data Format thinking that's for cases like this, although I may have misinterpreted.

These formats aren't interchangeable though; a VMT and QC file for example are distinguishable at a glance due to the differing structure and keywords.

moofemp avatar Jun 09 '21 04:06 moofemp

It seems to me that's what the group field is for in languages.yml? I defined all of the Source Engine languages with group: Valve Data Format thinking that's for cases like that, although I could have misinterpreted. These formats aren't interchangeable though; a VMT and QC file for example are distinguishable at a glance due to the differing structure and keywords.

Group is for variants of a language, like JSX for JavaScript. So if there are indeed differences in the language format then it may be warranted but if it's just a different extension for VDF it should go under it.

Nixinova avatar Jun 09 '21 04:06 Nixinova

Group is for variants of a language, like JSX for JavaScript. So if there are indeed differences in the language format then it may be warranted but if it's just a different extension for VDF it should go under it.

I'm not quite sure what quantifies being "a different extension for VDF" - the syntax for each is the same, but the use case, structure, and library of keywords differ. It's worth mentioning that the Notepad++ VDF plugins I linked have separate languages for CFG, FGD, VMT, QC, and "misc VDF" (likely because VCD and VMF aren't usually edited in a plaintext editor). I would consider each one a distinct child of VDF.

moofemp avatar Jun 09 '21 04:06 moofemp

but the use case, structure, and library of keywords differ

I think if that's the case having child languages would fit.

Nixinova avatar Jun 09 '21 06:06 Nixinova

I've finally got the time to look at this properly and I'm conflicted about adding all of these as their own languages when the structure is very similar, bar a few, and they're all being added as data and then grouped. The result will be that none of the languages will appear in the sidebar unless an override is implemented, and then it will only show up as "Valve Data Format" which when clicked will only cause Search to return the .vdf files. You're also using the same grammar for all, so I assume @Nixinova's grammar support all of these too.

I'll have to think about this a bit more before making a final decision.

In the mean time, a few observations:

  1. .vcd is quite a popular extension for Verilog so this extension and at least two samples will need to be added to Verilog
  2. The .fgd sample included is also too big. Please replace it with a smaller sample.
  3. You've added .lin and .prt extensions without samples or links to search results. These extensions appear to be quite commonly used for a variety of content with no common pattern that I can see from a quick look. I think it's best that we do not include these extensions as it will introduce a lot of incorrect classification, unless you can identify the other languages and want to add them as part of this PR.
  4. You've added the .vmx extension without samples or a link to the search results. This extension is also very commonly used for VMWare configuration files which is popular enough for inclusion in Linguist so we'd need to add support for this now too.
  5. You've added the .qci extension without a sample or links to search results. This extension doesn't appear popular enough for inclusion right now either.

lildude avatar Jul 12 '21 09:07 lildude

I've finally got the time to look at this properly and I'm conflicted about adding all of these as their own languages when the structure is very similar, bar a few, and they're all being added as data and then grouped. The result will be that none of the languages will appear in the sidebar unless an override is implemented, and then it will only show up as "Valve Data Format" which when clicked will only cause Search to return the .vdf files. You're also using the same grammar for all, so I assume @Nixinova's grammar support all of these too.

I'll have to think about this a bit more before making a final decision.

In the mean time, a few observations: [...]

I've fixed observations 2-5 https://github.com/github/linguist/pull/5413/commits/23627f2038b8693e48dc89a5ce47be5730a99b4a https://github.com/github/linguist/pull/5413/commits/7d8f65dfebe641693084e27710bd5c023652174a https://github.com/github/linguist/pull/5413/commits/f46989356609984de8d36d56ef782c35f9a8c255, though I'm not sure how I should approach resolving the rest of this comment.

(I don't really know why I included .lin/.prt/.vmx in the first place as that's compile stuff and is best ignored.)

I will admit that I don't entirely know what I'm doing with contributing here and the mentioned Linguist behaviour does not seem desirable. I chose "data" because it seemed most appropriate (these are more or less various scripts for a game engine SDK) and I grouped them because the syntax is the same and they are for the same engine. What would be the correct settings in this case?

The reason I opened this PR is because it was bothering me that my Source Engine mod was being identified as primarily Batch, with some ReScript (probably from .res, which I believe is a misidentified Resource file) and Squirrel. It would look better if the language was identified as, well, a Source Engine mod. Would it then be better to identify some of these formats as "programming" instead of "data" and possibly ungroup them?

moofemp avatar Jul 12 '21 12:07 moofemp

though I'm not sure how I should approach resolving the rest of this comment.

🤔 I though no. 1 was clear. To put it into other words, you need to

  1. find two samples of Verilog .vcd files
  2. add them to the Verilog samples directory
  3. add .vcd to the lise of Verilog extensions in languages.yml

All of this needs to happen in this PR.

As for the data vs programming type decision: it really depends on how the files are generated and used. If they're almost always used to tell something else what to do or to populate something else, it's commonly considered data and Linguist should reflect this as peeps often don't consider this "programming code". XML and JSON are such examples.

If they're written and used/compiled as you would a programming language like Ruby or C, then these files should be considered as programming.

I have no idea which is appropriate in this case as I have no idea about the language. Maybe @Nixinova has some input as the grammar author.

lildude avatar Jul 19 '21 10:07 lildude

Can you try find smaller Verilog files which aren't quite so full of ones and zeros 😄

lildude avatar Jul 19 '21 13:07 lildude

🤔 I though no. 1 was clear. To put it into other words, you need to [...]

~~My bad, I've done this now: https://github.com/github/linguist/pull/5413/commits/7c30b61d9edaac1175959cb9fe08611fac5cfd54~~

Can you try find smaller Verilog files which aren't quite so full of ones and zeros 😄

Is this better? https://github.com/github/linguist/pull/5413/commits/83e15132494890a654838fc6ab1eba4bf0fa9294

As for the data vs programming type decision: it really depends on how the files are generated and used. If they're almost always used to tell something else what to do or to populate something else, it's commonly considered data and Linguist should reflect this as peeps often don't consider this "programming code". XML and JSON are such examples.

If they're written and used/compiled as you would a programming language like Ruby or C, then these files should be considered as programming.

It's hard to describe in terms other than "scripts for a game engine"... hopefully this helps?

  • VCD: Program-generated choreography scene that is executed by the game at runtime
  • FGD: Manually written list of game entities, which is used by the SDK tools
  • VMF: (usually) Program-generated game level, which is compiled before being run by the game (essentially the core format of Source Engine singleplayer mod development)
  • VMT: Manually written material/texture information that is read by the game engine at runtime
  • QC: Manually written model information which is compiled by the SDK tools, then read by the game engine at runtime

I believe this means FGD and VMT are definitely data, although I'm not sure about the others.

moofemp avatar Jul 19 '21 14:07 moofemp

Extension .qc has a lot of these files - they seem similar as they're related to Quake but are they the same language?

Nixinova avatar Jul 19 '21 21:07 Nixinova

Extension .qc has a lot of these files - they seem similar as they're related to Quake but are they the same language?

That one I honestly don't know - I distinctly remember hearing somewhere that Source QC is derived from Quake C (hence the alias I put in the language definition), but now the closest mention I can find to it is at https://developer.valvesoftware.com/wiki/Talk:QC.

Edit: Oop, I also just found this, where a user dismisses the Quake C relationship as "fake news": https://developer.valvesoftware.com/w/index.php?title=QC&type=revision&diff=214733&oldid=205871

I see there's already a "Quake" language; would this PR need a definition for Quake C to distinguish it from Source QC? I see similarities in the definitions at the top of the file, though I've never seen Source QC with functions like that.

moofemp avatar Jul 19 '21 21:07 moofemp

I see there's already a "Quake" language; would this PR need a definition for Quake C to distinguish it from Source QC?

If QuakeC is a separate language and it meets the min file requirements then yes.

Nixinova avatar Feb 21 '22 21:02 Nixinova