linguist
linguist copied to clipboard
Add 6502 assembly language
Description
Checklist:
- [x] I am adding a new language.
- [x] The extension of the new language is used in hundreds of repositories on GitHub.com.
- Search results for each extension:
- https://github.com/search?utf8=%E2%9C%93&type=Code&ref=searchresults&q=extension%3AS+LDY+NOT+nothack
- https://github.com/search?utf8=%E2%9C%93&type=Code&ref=searchresults&q=extension%3AASM+LDY+NOT+nothack
- https://github.com/search?utf8=%E2%9C%93&type=Code&ref=searchresults&q=extension%3AINC+LDY+NOT+nothack
- Search results for each extension:
- [x] I have included a real-world usage sample for all extensions added in this PR:
- Sample source(s):
- https://github.com/MonstersGoBoom/Kickassembler-Modules/blob/main/APUltra.asm
- https://github.com/commanderx16/x16-rom/blob/master/dos/dir.s
- https://github.com/VincentFoulon80/vixx/blob/master/routines/game.asm
- https://github.com/commanderx16/x16-rom/blob/master/dos/fat32/lib.inc
- https://github.com/gridbugs/mos6502/blob/master/test-assets/instr_test-v5/source/common/macros.inc
- https://github.com/VincentFoulon80/vixx/blob/master/main.asm
- https://github.com/MJoergen/x16-assembly-tutorial/blob/master/Episode_7/math.s
- https://github.com/MJoergen/x16-assembly-tutorial/blob/master/Episode_7/tennis.inc
- https://github.com/commanderx16/x16-rom/blob/master/dos/vera.inc
- Sample license(s):
- BSD (2-clause): dir.s, lib.inc, vera.inc
- MIT: APUltra.asm, game.asm, macros.inc, main.asm, tennis.inc, video.s
- Public Domain: math.s
- Sample source(s):
- [x] I have included a syntax highlighting grammar: https://github-lightshow.herokuapp.com/?utf8=%E2%9C%93&scope=from-url&grammar_format=auto&grammar_url=https%3A%2F%2Fgithub.com%2FElektron72%2F6502-generic-grammar%2Fblob%2Fmain%2Fgrammars%2F6502.cson&grammar_text=&code_source=from-url&code_url=https%3A%2F%2Fgithub.com%2Fcommanderx16%2Fx16-rom%2Fblob%2Fmaster%2Fdos%2Fdir.s&code=
- [x] I have updated the heuristics to distinguish my language from others using the same extension.
- [x] The extension of the new language is used in hundreds of repositories on GitHub.com.
I moved macros.inc because it is a 6502 assembly include file, and I was worried that leaving it in the general "Assembly" directory would confuse the classifier. Please let me know if I should move it back to its original location.
I moved macros.inc because it is a 6502 assembly include file
Are you 100% sure about this? I ask as that file has been part of the repo for over 7 years without being flagged as problematic. I know nothing about Assembly so it's possible we've been lucky, but I want to be sure as this will influence the classifier, which interesting hasn't detected this file incorrectly based on your changes.
Talking of which, your changes are tripping up the classifier suggesting there's room for improvement:
[...] 29 6502 Assembly/tennis.inc BAD (HTML) 30 6502 Assembly/vera.inc BAD (PHP) [...] 645 Unix Assembly/hello.s BAD (6502 Assembly)
These failures are saying that the classifier is saying that the file in question is being detected as the language in brackets instead of the language dictated by the folder holding the file.
29 6502 Assembly/tennis.inc BAD (HTML) 30 6502 Assembly/vera.inc BAD (PHP)
Both of these appear to be because quite simple and look like variable assignments in other languages and similar to things like href=foo in HTML so may not be very good representative samples of "6502 Assembly".
645 Unix Assembly/hello.s BAD (6502 Assembly)
I'm not sure about this one as I don't know Assembly, but either this file really is "6502 Assembly" or your samples may not be "6502 Assembly"-specific enough, kinda like how .h files need specific syntax to differentiate whether they're specific for C, C++ or C#.
When you've got a mo @Alhadis, I'd appreciate your 👀 on the regexes here. 🙇
Yeah, it definitely shouldn't be classifying hello.s as 6502 Assembly. I need to improve my heuristics, and remove those non-representative examples that probably can't even be classified. Sorry for submitting this PR without noticing these issues.
After re-examining this pull request, I've come to the conclusion that, when I originally submitted this, I severely underestimated the complexity of the changes necessary to add 6502 Assembly to Linguist, and that classifying these files to the level of accuracy necessary for this project is beyond my skill level. I apologize for being unable to complete these changes.
@Elektron72 Don't apologise. Assembly languages are perhaps the most challenging and problematic of candidates that Linguist has ever had to deal with. Aside from the generic, often-conflicting file-extensions, writing a reliable heuristic isn't always possible (or a good idea), and even with craploads of samples, classification accuracy can still be flaky.
So, don't fret. 😉 We appreciate the fact that you offered to contribute in the first place.