linguist icon indicating copy to clipboard operation
linguist copied to clipboard

Add 6502 assembly language

Open Elektron72 opened this issue 3 years ago • 5 comments

Description

Checklist:

  • [x] I am adding a new language.
    • [x] The extension of the new language is used in hundreds of repositories on GitHub.com.
      • Search results for each extension:
        • https://github.com/search?utf8=%E2%9C%93&type=Code&ref=searchresults&q=extension%3AS+LDY+NOT+nothack
        • https://github.com/search?utf8=%E2%9C%93&type=Code&ref=searchresults&q=extension%3AASM+LDY+NOT+nothack
        • https://github.com/search?utf8=%E2%9C%93&type=Code&ref=searchresults&q=extension%3AINC+LDY+NOT+nothack
    • [x] I have included a real-world usage sample for all extensions added in this PR:
      • Sample source(s):
        • https://github.com/MonstersGoBoom/Kickassembler-Modules/blob/main/APUltra.asm
        • https://github.com/commanderx16/x16-rom/blob/master/dos/dir.s
        • https://github.com/VincentFoulon80/vixx/blob/master/routines/game.asm
        • https://github.com/commanderx16/x16-rom/blob/master/dos/fat32/lib.inc
        • https://github.com/gridbugs/mos6502/blob/master/test-assets/instr_test-v5/source/common/macros.inc
        • https://github.com/VincentFoulon80/vixx/blob/master/main.asm
        • https://github.com/MJoergen/x16-assembly-tutorial/blob/master/Episode_7/math.s
        • https://github.com/MJoergen/x16-assembly-tutorial/blob/master/Episode_7/tennis.inc
        • https://github.com/commanderx16/x16-rom/blob/master/dos/vera.inc
      • Sample license(s):
        • BSD (2-clause): dir.s, lib.inc, vera.inc
        • MIT: APUltra.asm, game.asm, macros.inc, main.asm, tennis.inc, video.s
        • Public Domain: math.s
    • [x] I have included a syntax highlighting grammar: https://github-lightshow.herokuapp.com/?utf8=%E2%9C%93&scope=from-url&grammar_format=auto&grammar_url=https%3A%2F%2Fgithub.com%2FElektron72%2F6502-generic-grammar%2Fblob%2Fmain%2Fgrammars%2F6502.cson&grammar_text=&code_source=from-url&code_url=https%3A%2F%2Fgithub.com%2Fcommanderx16%2Fx16-rom%2Fblob%2Fmaster%2Fdos%2Fdir.s&code=
    • [x] I have updated the heuristics to distinguish my language from others using the same extension.

Elektron72 avatar Dec 26 '21 01:12 Elektron72

I moved macros.inc because it is a 6502 assembly include file, and I was worried that leaving it in the general "Assembly" directory would confuse the classifier. Please let me know if I should move it back to its original location.

Elektron72 avatar Feb 10 '22 19:02 Elektron72

I moved macros.inc because it is a 6502 assembly include file

Are you 100% sure about this? I ask as that file has been part of the repo for over 7 years without being flagged as problematic. I know nothing about Assembly so it's possible we've been lucky, but I want to be sure as this will influence the classifier, which interesting hasn't detected this file incorrectly based on your changes.

Talking of which, your changes are tripping up the classifier suggesting there's room for improvement:

[...] 29 6502 Assembly/tennis.inc BAD (HTML) 30 6502 Assembly/vera.inc BAD (PHP) [...] 645 Unix Assembly/hello.s BAD (6502 Assembly)

These failures are saying that the classifier is saying that the file in question is being detected as the language in brackets instead of the language dictated by the folder holding the file.

lildude avatar Feb 11 '22 09:02 lildude

29 6502 Assembly/tennis.inc BAD (HTML) 30 6502 Assembly/vera.inc BAD (PHP)

Both of these appear to be because quite simple and look like variable assignments in other languages and similar to things like href=foo in HTML so may not be very good representative samples of "6502 Assembly".

645 Unix Assembly/hello.s BAD (6502 Assembly)

I'm not sure about this one as I don't know Assembly, but either this file really is "6502 Assembly" or your samples may not be "6502 Assembly"-specific enough, kinda like how .h files need specific syntax to differentiate whether they're specific for C, C++ or C#.

lildude avatar Feb 11 '22 10:02 lildude

When you've got a mo @Alhadis, I'd appreciate your 👀 on the regexes here. 🙇

lildude avatar Feb 11 '22 10:02 lildude

Yeah, it definitely shouldn't be classifying hello.s as 6502 Assembly. I need to improve my heuristics, and remove those non-representative examples that probably can't even be classified. Sorry for submitting this PR without noticing these issues.

Elektron72 avatar Feb 11 '22 14:02 Elektron72

After re-examining this pull request, I've come to the conclusion that, when I originally submitted this, I severely underestimated the complexity of the changes necessary to add 6502 Assembly to Linguist, and that classifying these files to the level of accuracy necessary for this project is beyond my skill level. I apologize for being unable to complete these changes.

Elektron72 avatar Dec 19 '22 19:12 Elektron72

@Elektron72 Don't apologise. Assembly languages are perhaps the most challenging and problematic of candidates that Linguist has ever had to deal with. Aside from the generic, often-conflicting file-extensions, writing a reliable heuristic isn't always possible (or a good idea), and even with craploads of samples, classification accuracy can still be flaky.

So, don't fret. 😉 We appreciate the fact that you offered to contribute in the first place.

Alhadis avatar Dec 19 '22 19:12 Alhadis