linguist
linguist copied to clipboard
Add common systemd extensions to INI
Add a “systemd unit” language for common file extensions related to systemd and podman. Utilizes the existing .ini syntax highlighting rules, and reclassifies the most common .service file extension from “dictionary”.
Description
systemd unit files are used by systemd and podman for configuration. Their syntax is essentially the same as the old DOS/Windows .ini syntax. The most common file extensions related, .service, is currently classified as “desktop”. This PR reclassifies .service as a new “systemd unit file” language and adds the most common systemd and podman file extensions.
This fixes #7161 .
@vorburger apologies — I noticed that you had opened PR #7163 after I had created this one.
Checklist:
- [x] I am adding a new extension to a language.
- [x] The new extension is used in hundreds of repositories on GitHub.com
- [x] I have included a real-world usage sample for all extensions added in this PR:
- Sample source(s):
- 2 samples per new extension included. Each file has a link to the original at the top.
- Sample license(s):
- GPL (from the systemd project) and Apache 2.0 (from the podman project)
- Sample source(s):
- [ ] I have included a change to the heuristics to distinguish my language from others using the same extension.
- not needed
This appears to have overlap with https://github.com/github-linguist/linguist/pull/7163 and like that PR I think this should really be rolled into INI. As you're moving the .service extension @Alhadis's comment might be worth considering and take this opportunity to roll Desktop into INI too along with adding all of these to INI rather than creating a new language - normally so only create a new language with nearly identical formats if they need a different grammar. This isn't the case here… you're straight out duplicating INI with a different name.
Sounds like a plan. Updated to:
- Merge
desktopintoINI - Add the below extensions to
INI:
The addition of .conf reconciles this PR and #7163, so this PR now covers all extensions that hit the usage minimums between the two.
The addition of
.confreconciles this PR and #7163, so this PR now covers all extensions that hit the usage minimums between the two.
.conf is faaar too generic and isn't necessarily INI or related to systemd. If you want to add the extension, I recommend adding it to generic.yaml and add a heuristic, but this might be very difficult as the heuristic needs to be bulletproof so it doesn't catch other languages.
Take 3! Added .conf to generic.yml. Heuristic boils down to: a .conf INI file must have both:
- A line of the form
[section](either commented out or not) - A line of the form
key = value(either commented out or not)
Requiring both narrows down the list a bit vs the last commit — looks like it eliminates more false negatives than false positives, but that’s probably where we’d want to lean. Latest counts:
Regarding the heuristics for the .conf extension, with the current configuration, there are potentially file types that are similar to INI that could be mixed-in with INI (false positives). For instance, in these search results (44k+ files), there are files that have at least a section and a key-value pair, but they include other lines with non-INI syntax as well.
Now, some of these files might benefit from being highlighted as an INI file since a lot of their content matches the usual INI syntax, but if we want to avoid mixing formats here, we could include the following rule after the 2 patterns currently used to match INI format.
- negative_pattern: '^[^#;=\[]+[^ \n=]$'
This regex simply tries to match any non-valid INI syntax ie. a row that is neither a comment, an empty row, a section or a key-value pair. Adding this would mean that any file that contains an invalid line would not be considered INI.
Notes on the regex
First term:
/[^#;=\[]+/
This simply matches any character except comments characters ("#" and ";") as well as "=" and "[" which are needed for key-value pairs and sections.
Second term:
/[^ \n=]/
This is just to make sure that we don't match: a) A row with only whitespace b) Two empty rows (without \n in there, we would match \n\n) c) A key-value pair without the value specified
Agreed on both the negative pattern and using the existing key_equals_value pattern. Made both of those two changes. Now at ~414k .conf candidates for anyone scoring at home.
It's looking good for the .conf extension! For the others, I'd say a heuristic is probably also warranted, especially .target.
The heuristics could be the exact same as the ones for .conf or the ones used in the search queries for each extension.
Sounds good! Since all of the files (including the existing ones) are “just” INI files, I included all of the ones listed in languages.yml in the heuristic. If we have file format name collisions with those extensions, we should also ignore those.
Woah 😳, when I said "the others", I only meant the ones that are being added in this PR 😅. To keep the scope of this PR more managable, I'd suggest only including a heuristics for those new extensions. Also note that some of the existing ones like .url and .pro already have their own heuristic and we don't want to interfere with that.
I got a little excited there! 😉 Switched to only the new file extensions in the heuristic.
Looking at the Desktop Entry Specification, there could be an argument made that it should remain a seperate entry from the generic INI language. I'd suggest keeping it as a seperate language, but adding it to the INI group instead (which might have been what @Alhadis meant by his comment).
desktop:
+ group: INI
type: data
extensions:
- ".desktop"
- ".desktop.in"
- - ".service"
tm_scope: source.desktop
ace_mode: text
language_id: 412
An example of where the Desktop Entry format is different from the generic INI format is with Localized values for keys (see an example file). These aren't valid in the generic INI format since [ and ] can't appear in the key.
Oh yes. Nice find @DecimalTurn. I think your suggestion makes sense with that in mind.
Generic extensions are supposed to be used when the filetype in question has an unambiguous pattern to differentiate it from clearly-unrelated files with the same extension. INI files, already being a poorly-standardised file format with innumerable variations in syntax, can't be identified based on patterns that aren't likely to occur in other ad-hoc configuration formats.
This means that adding .conf as an INI extension—generic or not—is not only going to lead to inconsistent language (mis)classification, it'll also result in ugly-looking highlighting for INI-ish formats that don't conform to the syntax expected of the INI highlighting grammar. In other words, the same problem alluded to by @DecimalTurn:
An example of where the Desktop Entry format is different from the generic INI format is with Localized values for keys (Example file). These aren't valid in the generic INI format since [ and ] can't appear in the key.
I know it's tempting to have .conf files highlighted as INI, but the margin for error here isn't worth the prettier-looking files (which authors can always identify as INI using a .gitattributes file.