linguist icon indicating copy to clipboard operation
linguist copied to clipboard

Refactor `Linguist::Repository` to isolate Rugged usage

Open vdye opened this issue 1 year ago • 0 comments

Description

The goal of this change is to add flexibility to how repository data is accessed by Linguist::Repository & Linguist::LazyBlob, allowing users to easily configure an alternative to Rugged.

Internally, Linguist::Repository and Linguist::LazyBlob use Rugged to read Git repository data, including diff, attribute, and blob information. While this works for most repositories, it has limits:

  • Rugged/libgit2 can lag behind feature support in Git (e.g. reftable, previously SHA-256).
  • Rugged is a Git API, which makes using Linguist with other SCMs challenging.

The approach taken here is to replace the Rugged::Repository instance in the Linguist::Repository with a new Linguist::Source::Repository instance. The "source" repository contains functions wrapping what were previously Rugged operations (diff, attribute lookup, etc.). Users can then write their custom implementations of those functions and pass their Linguist::Source::Repository into Linguist::Repository to use them seamlessly.

This isn't intended to be a breaking change, so there are a few extra things done to avoid compatibility issues with existing usage:

  • If a Rugged::Repository is passed in as the first argument to either the Linguist::Repository or LazyBlob initializer, it is wrapped in a Linguist::Source::RuggedRepository internally.
  • GIT_ATTR_OPTS & GIT_ATTR_FLAGS are Rugged-specific so they're moved to RuggedRepository, but the LazyBlob constants are not removed and instead point to their RuggedRepository counterparts.
  • current_tree and read_index don't make sense for non-Rugged repos (the former returns a Rugged tree instance, the latter is specific to how Rugged needs to look up attributes). They raise NotImplementedError with a message referencing deprecation only if called on a non-Rugged repository instance; otherwise they behave the same way as before.
  • A method_missing implementation is added to Linguist::RuggedRepository to delegate any unmatched method calls to the internal Rugged::Repository instance (in case users are calling Linguist::Repository.repository directly).

The only possible compatibility issue I can imagine is if a user does some kind of type check on Linguist::Repository.repository (previously it was a Rugged::Repository, now it'll be a Linguist::Source::RuggedRepository). That seems highly unlikely, though, and should be a simple fix if needed.


The commits on this branch are organized to be atomic and incrementally reviewable:

  • Commit 1 adds the generic Linguist::Source:Repository and Linguist::Source::Diff interfaces, with all methods raising NotImplementedError to ensure they are overridden by a subclass implementation.
  • Commit 2 adds a Rugged implementation of Linguist::Source::Repository matching existing usage in compute_stats and Linguist::LazyBlob.
  • Commit 3 updates Linguist::Repository to use a Linguist::Source::Repository instead of a Rugged::Repository to read repository content.
  • Commit 4 adds the method_missing implementation to RuggedRepository.

Checklist:

  • [ ] I am adding a new extension to a language.

    • [ ] The new extension is used in hundreds of repositories on GitHub.com
      • Search results for each extension:
        • https://github.com/search?type=code&q=NOT+is%3Afork+path%3A*.FOOBAR+KEYWORDS
    • [ ] I have included a real-world usage sample for all extensions added in this PR:
      • Sample source(s):
        • [URL to each sample source, if applicable]
      • Sample license(s):
    • [ ] I have included a change to the heuristics to distinguish my language from others using the same extension.
  • [ ] I am adding a new language.

    • [ ] The extension of the new language is used in hundreds of repositories on GitHub.com.
      • Search results for each extension:
        • https://github.com/search?type=code&q=NOT+is%3Afork+path%3A*.FOOBAR+KEYWORDS
    • [ ] I have included a real-world usage sample for all extensions added in this PR:
      • Sample source(s):
        • [URL to each sample source, if applicable]
      • Sample license(s):
    • [ ] I have included a syntax highlighting grammar: [URL to grammar repo]
    • [ ] I have added a color
      • Hex value: #RRGGBB
      • Rationale:
    • [ ] I have updated the heuristics to distinguish my language from others using the same extension.
  • [ ] I am fixing a misclassified language

    • [ ] I have included a new sample for the misclassified language:
      • Sample source(s):
        • [URL to each sample source, if applicable]
      • Sample license(s):
    • [ ] I have included a change to the heuristics to distinguish my language from others using the same extension.
  • [ ] I am changing the source of a syntax highlighting grammar

    • Old: [URL to grammar repo]
    • New: [URL to grammar repo]
  • [ ] I am updating a grammar submodule

  • [x] I am adding new or changing current functionality

    • [x] I have added or updated the tests for the new or changed functionality.
  • [ ] I am changing the color associated with a language

    • [ ] I have obtained agreement from the wider language community on this color change.
      • [URL to public discussion]
      • [Optional: URL to official branding guidelines for the language]

vdye avatar Oct 16 '24 17:10 vdye