linguist icon indicating copy to clipboard operation
linguist copied to clipboard

Add language: Mojo

Open lattner opened this issue 1 year ago • 17 comments

Add a new programming language to Linguist, named Mojo, for file extensions .mojo and .🔥. You can learn more about Mojo at https://docs.modular.com/mojo/.

Description

Mojo is a new programming language developed by me (Chris Lattner) and my amazing team at Modular.

Because it's a brand new language, it does not meet linguist's requirements for broad use on GitHub. The team and I thought that we'd send this anyway, to see if an exception could be made, based on the success and widespread use on GitHub of some of my other programming languages (LLVM IR, MLIR, Swift). If that's not possible, no worries, and thanks for your time!

Checklist:

  • [x] I am adding a new language.
    • [ ] ~The extension of the new language is used in hundreds of repositories on GitHub.com.~ As explained above, this is a new language without extensive usage yet.
    • [x] I have included a real-world usage sample for all extensions added in this PR:
      • Sample source(s):
        • Matmul.mojo is adapted from https://docs.modular.com/mojo/notebooks/Matmul.html
        • Bool.🔥 is adapted from https://docs.modular.com/mojo/notebooks/BoolMLIR.html
      • Sample license(s):
    • [x] I have included a syntax highlighting grammar: https://github.com/modularml/mojo-syntax
    • [x] I have added a color
      • Hex value: #ff4c1f
      • Rationale: This color invokes the 🔥 emoji that is used for the language's branding, and as a supported file extension.
    • [ ] ~I have updated the heuristics to distinguish my language from others using the same extension.~ No other language in this repository uses the .mojo or .🔥 file extensions, so I did not add a heuristic.

lattner avatar May 02 '23 20:05 lattner

The team and I thought that we'd send this anyway, to see if an exception could be made, based on the success and widespread use on GitHub of some of my other programming languages (LLVM IR, MLIR, Swift).

Unfortunately, we can't as it sets a precedent we don't want to fight each and every time someone develops "the next greatest thing, promise on my hamster's life". Given the popularity you've mentioned, and the hype around AI, it shouldn't take long to reach the levels we require though.

lildude avatar May 03 '23 07:05 lildude

Makes sense, thank you for the consideration!

lattner avatar May 03 '23 17:05 lattner

An emoji file extension. I've seen everything now…

Can't stop the innovation! :)

lattner avatar May 07 '23 01:05 lattner

Hey all! First off thanks for your reviews since we first submitted this pull request. Originally Chris wrote:

Because it's a brand new language, it does not meet linguist's requirements for broad use on GitHub.

But I believe we may now meet those requirements. The contribution guidelines here say to look for 200 unique :user/:repo repositories, but that due to GitHub search being in flux, temporarily the requirement documented here is "at least 2000 files per extension indexed in the last year (the number you see at the top of the search results)." In a search for files named *.mojo here, I see 1.1k results -- and there a few .🔥 results as well.

Also, not sure if it's worth mentioning, but as of this patch, Vim now recognizes the .mojo and .🔥 extensions.

So, would it be possible to take another look at this and its "Pending Popularity" tag? Thanks in advance! 🙏 @lildude @Alhadis

modocache avatar Sep 13 '23 00:09 modocache

No change. It still doesn't look popular enough, especially when you consider that mojo-lang owns most of them so excluding them drops things quite considerably, And most definitely not the emoji extension. You might want to consider removing that extension from this PR as it will only delay things. Users can always implement an override until such time as it's popular enough for addition in the future.

lildude avatar Sep 13 '23 00:09 lildude

And further refinement shows a lot of those remaining files are XML, which whilst not popular enough for inclusion right now, would need to be explicitly added to XML in this PR too to ensure they remain correctly identified as XML (the extension takes precedence so this PR as it stands would cause all those files to be identified as Mojo and incorrect syntax highlighting will apply).

Current results if we exclude mojo-lang and XML.

lildude avatar Sep 13 '23 00:09 lildude

Gotcha, thanks for the quick & thorough response! I'll use those searches to keep tabs on things, and in the meantime update this PR to handle XML with a .mojo extension.

And most definitely not the emoji extension. You might want to consider removing that extension from this PR as it will only delay things.

I see, so I should consider the popularity of the .mojo extension as separate from the popularity of the .🔥 extension. That makes sense, thanks!

modocache avatar Sep 13 '23 00:09 modocache

Thanks for your help! I pushed an update here with heuristics for Mojo vs. XML, and removed .🔥.

I'll keep an eye on the search link you shared and ping this thread when we cross the popularity threshold. Right now I see 1.1k public Mojo files that are not XML 😃

modocache avatar Oct 24 '23 02:10 modocache

This Google Sheet is a great way to view Mojo popularity: https://docs.google.com/spreadsheets/d/1mmS5xwRrtBIZubdEIrUpsGzZxX2mcqcc74tBjDfo6Lo/edit?usp=sharing (updated daily) A quick look shows:

  • there are currently about 310 unique (non-fork) public Mojo repos.
  • the current rate of new Mojo file creation is about 3k a year. (1588 files/6.5ish months since launch)(12month/year) = 2932 files/year
  • sum of forks for all unique repos is over 2.3k

djkelleher avatar Nov 21 '23 19:11 djkelleher

Update: currently 4479 Mojo🔥 files across ~355 confirmed Mojo🔥 repos 🎉🎉

djkelleher avatar Dec 03 '23 14:12 djkelleher

@lildude I think we're good to go

image

tairov avatar Dec 03 '23 15:12 tairov

Hah, thank you to all of our community members pinging this -- it does indeed look like we're above the 2k source file threshold! Reviewers, please take a look at your earliest convenience, thanks!

modocache avatar Dec 03 '23 19:12 modocache

Hah, thank you to all of our community members pinging this -- it does indeed look like we're above the 2k source file threshold! Reviewers, please take a look at your earliest convenience, thanks!

Not quite. Three users are having an undue influence on the figures with one repo accounting for over 1000 of the results. Excluding them and forks brings things down quite a bit.

lildude avatar Dec 05 '23 08:12 lildude

Aha, gotcha -- thanks! Sorry for all the pings, the idea of getting highlighted on GitHub is very exciting 🤩

I'll try to keep an eye on the search, and comment here once it looks like we've reached 2k, excluding any users with an outsized number of files. Thanks for all your help!

modocache avatar Dec 05 '23 14:12 modocache

I'll try to keep an eye on the search, and comment here once it looks like we've reached 2k, excluding any users with an outsized number of files.

Please don't. PRs are only merged when I'm getting close to making a new release and I will check all usage of pending PRs at that time.

lildude avatar Dec 05 '23 14:12 lildude

Ah OK, thanks for the clarification, sorry if I missed that -- no pinging it is. Thanks!

modocache avatar Dec 05 '23 14:12 modocache

I'll try to make this my last annoying comment :slightly_smiling_face:

I think the count shown in the sheet is more accurate than that query. There are more non-fork Mojo repos that show up when you leave off the fork:false parameter, and also more that show up using different queries and API endpoints, as well as a few that seem to be not indexed at all.

The way to get the most accurate count seemed to be to try a bunch of queries (via HTTP API), use the fork boolean in the JSON responses to filter out forks, then check the file content/extension.

Everything in the sheet has been gone through by multiple people and the file count as of today is 4585 with one outlier of 2579. Excluding outlier: 4585-2579=2006 files

djkelleher avatar Dec 05 '23 15:12 djkelleher

@lildude - Just wanted to check if we need to do anything else here? Want to make sure you have everything you need from us. Thanks so much again

iamtimdavis avatar Mar 12 '24 15:03 iamtimdavis

Almost... I can't update the PR to merge in master as maintainers haven't been granted write perms on this PR and can't approve CI to run for some reason.

Usage looks good though.

lildude avatar Mar 12 '24 16:03 lildude

Almost... I can't update the PR to merge in master as maintainers haven't been granted write perms on this PR and can't approve CI to run for some reason.

Amazing! Sorry, small parse error on my end for this sentence -- do I need to, or can I, update write permissions on this PR? Or are you saying that for some reason write permissions don't seem to have been granted by some automated process, and it's not something I can fix?

modocache avatar Mar 12 '24 20:03 modocache

Almost... I can't update the PR to merge in master as maintainers haven't been granted write perms on this PR and can't approve CI to run for some reason.

Amazing! Sorry, small parse error on my end for this sentence -- do I need to, or can I, update write permissions on this PR? Or are you saying that for some reason write permissions don't seem to have been granted by some automated process, and it's not something I can fix?

It's something you do when creating the PR and can be updated on current PRs. See the docs here.

lildude avatar Mar 13 '24 08:03 lildude

NM. You can't because your fork is owned by an org. It's only an option for personally owned forks.

lildude avatar Mar 13 '24 08:03 lildude

Thank you @lildude !

lattner avatar Mar 13 '24 12:03 lattner