brew icon indicating copy to clipboard operation
brew copied to clipboard

rubocops: refactor all formula cops to use a single-pass force class

Open Bo98 opened this issue 3 years ago • 3 comments

The performance of brew style in Homebrew/core has regressed quite noticeably as the number of cops and the size of the tap both grow. In CI, it takes about 4 minutes which, while could be worse, is noticeable, particularly when given the full actual test routine of some formulae is quicker than that.

The way formula cops worked before is that the base class would hook to on_class, check if its a Formula subclass, and invoke audit_formula. Subclasses of the base FormulaCop would implement this method, be given the full AST of the formula class and be expected to individually traverse through this AST. This meant that each individual formula cop basically did its own walking of formula file ASTs. On top of this, some helper functions like find_every_method_call_by_name were used which in themesleves traverse the entire AST looking for every instance of a particular method call, so each individual formula cop could traverse the AST multiple times. AST traversal isn't particularly slow in isolation, but it adds up when you multiply the 6000 formula files, 54 cops and the multiple calls to find_every_method_call_by_name etc in each cop.

Instead, I've used Rubocop's "force" feature which allows you to specify a class, which is separately invoked from the standard AST cop traversal, to read over a file's source and invoke hooks in cops that opt-in. I created a FormulaForce class (see formula_force.rb) which, when it detects a valid formula file, traverses through the formula AST and invokes on_formula_ hooks on all cops that use it. The FormulaCop base class automatically opts-in all subclasses. A basic traversal of the hello formula would look something like:

  • on_formula_source
  • on_formula_file
  • on_formula_class
  • on_formula_desc
  • on_formula_url
  • on_formula_sha256
  • on_formula_license
  • on_formula_bottle
    • on_formula_sha256
  • on_formula_conflicts_with
  • on_formula_install
    • Generic method calls like on_formula_send, on_formula_if
  • on_formula_test
    • on_formula_send etc.
  • on_formula_class_end
  • on_formula_source_end

This traversal happens once for all formula cops combined, with each hook call itself traversing for cops that implement that hook.

The Formula AST component precendence list is used to determine which method/block calls have their own hooks. Nodes in the AST that are not on this list are instead sent through generic hooks like on_formula_send.

The result is that all formula cops now collectively run single-pass over the formula file. There are some exceptions, most notably the ComponentOrder cop due to its complexity.

Since this required a major refactor of all cops, one other change I did at the same time was remove the node_equals? helper function. It was highly inefficient as it parsed Ruby code every time it was run. Instead, calls directly to this were replaced by value and type checks of the AST node. The parameter_passed? helper function no longer uses node_equals?, and the replacement implementation no longer supports non-basic-literal types, but this was never actually needed in practice.

Using the "extreme case" of a brew style homebrew/core run with parallelisation disabled and a clean RuboCops cache, it took six and a half minutes on my machine before this change. Now it takes just under five minutes. This is probably the best we'll get, and will in practice be much faster than that because parallelisation is usually enabled and caches shouldn't be empty on anything beyond the first run. For CI, I also hope to make better use of caching by storing the RuboCop cache in actions/cache.

As can happen as a result of a refactor, there are a couple known behavioural changes. I've avoided touching any existing tests, apart from one (minor output change), but there are still some new offences in Homebrew/core. These were not intentional, but on review I felt they were actually behavioural improvements rather than regressions:

  • The enforcement for git URLs to have a fixed revision now applies to resources.
  • url :head style checking for livechecks now properly supports head URLs located inside head do blocks.
  • A redundant revision 0 is now checked.

Bo98 avatar May 03 '22 03:05 Bo98

Review period will end on 2022-05-04 at 03:56:19 UTC.

BrewTestBot avatar May 03 '22 03:05 BrewTestBot

Using the "extreme case" of a brew style homebrew/core run with parallelisation disabled and a clean RuboCops cache, it took six and a half minutes on my machine before this change. Now it takes just under five minutes. This is probably the best we'll get, and will in practice be much faster than that because parallelisation is usually enabled and caches shouldn't be empty on anything beyond the first run.

I'm still 👍🏻 merging this PR but, just for future optimisations, I'm unsure whether this level of performance increase is worth the level of refactoring?

MikeMcQuaid avatar May 03 '22 08:05 MikeMcQuaid

Review period ended.

BrewTestBot avatar May 04 '22 06:05 BrewTestBot

Yeah it was something I experimented with a couple cops and ended up preferring how it looked API wise anyway. Only took a couple hours to do it all.

I’m happy with 25% improvement, since it adds up over time.

On 3 May 2022, at 09:35, Mike McQuaid @.***> wrote:

 @MikeMcQuaid approved this pull request.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.

Bo98 avatar Oct 11 '22 08:10 Bo98

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

github-actions[bot] avatar Nov 02 '22 00:11 github-actions[bot]