Refactor `Linguist::Repository` to isolate Rugged usage
Description
The goal of this change is to add flexibility to how repository data is accessed by Linguist::Repository & Linguist::LazyBlob, allowing users to easily configure an alternative to Rugged.
Internally, Linguist::Repository and Linguist::LazyBlob use Rugged to read Git repository data, including diff, attribute, and blob information. While this works for most repositories, it has limits:
- Rugged/libgit2 can lag behind feature support in Git (e.g. reftable, previously SHA-256).
- Rugged is a Git API, which makes using Linguist with other SCMs challenging.
The approach taken here is to replace the Rugged::Repository instance in the Linguist::Repository with a new Linguist::Source::Repository instance. The "source" repository contains functions wrapping what were previously Rugged operations (diff, attribute lookup, etc.). Users can then write their custom implementations of those functions and pass their Linguist::Source::Repository into Linguist::Repository to use them seamlessly.
This isn't intended to be a breaking change, so there are a few extra things done to avoid compatibility issues with existing usage:
- If a
Rugged::Repositoryis passed in as the first argument to either theLinguist::RepositoryorLazyBlobinitializer, it is wrapped in aLinguist::Source::RuggedRepositoryinternally. GIT_ATTR_OPTS&GIT_ATTR_FLAGSare Rugged-specific so they're moved toRuggedRepository, but theLazyBlobconstants are not removed and instead point to theirRuggedRepositorycounterparts.current_treeandread_indexdon't make sense for non-Rugged repos (the former returns a Rugged tree instance, the latter is specific to how Rugged needs to look up attributes). They raiseNotImplementedErrorwith a message referencing deprecation only if called on a non-Rugged repository instance; otherwise they behave the same way as before.- A
method_missingimplementation is added toLinguist::RuggedRepositoryto delegate any unmatched method calls to the internalRugged::Repositoryinstance (in case users are callingLinguist::Repository.repositorydirectly).
The only possible compatibility issue I can imagine is if a user does some kind of type check on Linguist::Repository.repository (previously it was a Rugged::Repository, now it'll be a Linguist::Source::RuggedRepository). That seems highly unlikely, though, and should be a simple fix if needed.
The commits on this branch are organized to be atomic and incrementally reviewable:
- Commit 1 adds the generic
Linguist::Source:RepositoryandLinguist::Source::Diffinterfaces, with all methods raisingNotImplementedErrorto ensure they are overridden by a subclass implementation. - Commit 2 adds a Rugged implementation of
Linguist::Source::Repositorymatching existing usage incompute_statsandLinguist::LazyBlob. - Commit 3 updates
Linguist::Repositoryto use aLinguist::Source::Repositoryinstead of aRugged::Repositoryto read repository content. - Commit 4 adds the
method_missingimplementation toRuggedRepository.
Checklist:
-
[ ] I am adding a new extension to a language.
- [ ] The new extension is used in hundreds of repositories on GitHub.com
- Search results for each extension:
- https://github.com/search?type=code&q=NOT+is%3Afork+path%3A*.FOOBAR+KEYWORDS
- Search results for each extension:
- [ ] I have included a real-world usage sample for all extensions added in this PR:
- Sample source(s):
- [URL to each sample source, if applicable]
- Sample license(s):
- Sample source(s):
- [ ] I have included a change to the heuristics to distinguish my language from others using the same extension.
- [ ] The new extension is used in hundreds of repositories on GitHub.com
-
[ ] I am adding a new language.
- [ ] The extension of the new language is used in hundreds of repositories on GitHub.com.
- Search results for each extension:
- https://github.com/search?type=code&q=NOT+is%3Afork+path%3A*.FOOBAR+KEYWORDS
- Search results for each extension:
- [ ] I have included a real-world usage sample for all extensions added in this PR:
- Sample source(s):
- [URL to each sample source, if applicable]
- Sample license(s):
- Sample source(s):
- [ ] I have included a syntax highlighting grammar: [URL to grammar repo]
- [ ] I have added a color
- Hex value:
#RRGGBB - Rationale:
- Hex value:
- [ ] I have updated the heuristics to distinguish my language from others using the same extension.
- [ ] The extension of the new language is used in hundreds of repositories on GitHub.com.
-
[ ] I am fixing a misclassified language
- [ ] I have included a new sample for the misclassified language:
- Sample source(s):
- [URL to each sample source, if applicable]
- Sample license(s):
- Sample source(s):
- [ ] I have included a change to the heuristics to distinguish my language from others using the same extension.
- [ ] I have included a new sample for the misclassified language:
-
[ ] I am changing the source of a syntax highlighting grammar
- Old: [URL to grammar repo]
- New: [URL to grammar repo]
-
[ ] I am updating a grammar submodule
-
[x] I am adding new or changing current functionality
- [x] I have added or updated the tests for the new or changed functionality.
-
[ ] I am changing the color associated with a language
- [ ] I have obtained agreement from the wider language community on this color change.
- [URL to public discussion]
- [Optional: URL to official branding guidelines for the language]
- [ ] I have obtained agreement from the wider language community on this color change.