brew Reconsider a better API to replace direct `Homebrew::Livecheck::Strategy` calls within formulae and casks

Verification

[X] This issue's title and/or description do not reference a single formula e.g. brew install wget. If they do, open an issue at https://github.com/Homebrew/homebrew-core/issues/new/choose instead.

Provide a detailed description of the proposed feature

21 casks in Homebrew/cask & Homebrew/cask-versions contain direct calls to Homebrew::Livecheck::Strategy methods.

An additional 8 formulae use Homebrew::Livecheck::Strategy.page_content (which at least is a better API compared to the rest).

It's not a lot thankfully, but I don't think these are great because:

It is private API, which we avoid in these repos as we don't do deprecation periods for such API or guarantee any effort to ensure backwards compatibility (e.g. parameter changes).
The API to call some of them (e.g. Homebrew::Livecheck::Strategy::ExtractPlist.find_versions) is horrific in these contexts, requiring CaskLoader.load to load the cask itself again.

What is the motivation for the feature?

Seeing CaskLoader.load("winzip") inside winzip itself, which is problematic and causes issues.

How will the feature be relevant to at least 90% of Homebrew users?

It realistically won't given its scoped to livecheck only which is maintainer-only, but an improved API will have better maintainability and avoids bugs associated by easy misuse, particularly when things change in brew without the knowledge these exist. A formula/cask not reloading itself is an example of one such assumption that brew makes, unless you do so in a very specific way (i.e. using __FILE__, and even that isn't guaranteed to work universally if loading from a contents string is used).

What alternatives to the feature have been considered?

If we prefer it this way, we could just mark them as public API, as long as other improvements are made to avoid CaskLoader.

Nov 08 '23 02:11 Bo98

@Bo98 There are 18 casks that called Homebrew::Livecheck::Strategy::* for a variety of reasons. Is this a similar concern to recalling CaskLoader.load(*)? If so further improvement should be considered here.

At present, these are used in scenarios where the livecheck API needs to be re-used for a subsequent pass at usually loading page-content from a url that needs to be dynamically generated from a prior regex search.

Nov 08 '23 02:11 bevanjkay

@Bo98 There are 18 casks

Ah whoops, missed some are not in submodules. Corrected the search query now.

Nov 08 '23 02:11 Bo98

At present, these are used in scenarios where the livecheck API needs to be re-used for a subsequent pass at usually loading page-content from a url that needs to be dynamically generated from a prior regex search.

So I guess the API improvement that could be done here is a way to specify a series of strategies rather than being restricted to just the one strategy.

Nov 08 '23 02:11 Bo98

@Bo98 Is it worth updating some of the offending casks to at least use CaskLoader.load(__FILE__) in the interim?

Nov 08 '23 02:11 bevanjkay

Probably yeah. It's necessary for them to at least work mostly properly without HOMEBREW_NO_INSTALL_FROM_API.

I believe __FILE__ won't work in formulae due to the weird way they're loaded, but that doesn't matter for casks (currently).

Nov 08 '23 03:11 Bo98

So I guess the API improvement that could be done here is a way to specify a series of strategies rather than being restricted to just the one strategy.

An idea of what that could look like:

  livecheck do
    url "https://www.winzip.com/en/download/"
    regex(/href=.*?winzipmacedition[._-]?v?(\d+)\.dmg/i)
    strategy :page_match do |page, regex|
      major_version = page[regex, 1]
      next if major_version.blank?

      "https://download.winzip.com/winzipmacedition#{major_version}.dmg"
    end.then_strategy :extract_plist
    # or alternatively:
    end
    strategy :extract_plist 
  end

  livecheck do
    url "https://chromiumdash.appspot.com/fetch_releases?channel=Stable&platform=Mac"
    regex(/(\d+\.\d+\.\d+\.\d+)/i)
    strategy :json do |json|
      # Find the v8 commit hash for the newest Chromium release version
      v8_hash = json.max_by { |item| Version.new(item["version"]) }.dig("hashes", "v8")
      next if v8_hash.blank?

      "https://chromium.googlesource.com/v8/v8.git/+/#{v8_hash}"
    end.then_strategy :page_match do |page, regex|
      page.scan(regex).map(&:first)
    end
  end

Nov 08 '23 03:11 Bo98

In terms of how the DSL should look, would it be too confusing to just have multiple strategy blocks and then just evaluate them in the order they were declared. That would probably be the simplest option.

  livecheck do
    url "https://chromiumdash.appspot.com/fetch_releases?channel=Stable&platform=Mac"
    regex(/(\d+\.\d+\.\d+\.\d+)/i)
    strategy :json do |json|
      # Find the v8 commit hash for the newest Chromium release version
      v8_hash = json.max_by { |item| Version.new(item["version"]) }.dig("hashes", "v8")
      next if v8_hash.blank?

      "https://chromium.googlesource.com/v8/v8.git/+/#{v8_hash}"
    end
    strategy :page_match do |page, regex|
      page.scan(regex).map(&:first)
    end
  end

Nov 08 '23 04:11 apainintheneck

In terms of how the DSL should look, would it be too confusing to just have multiple strategy blocks and then just evaluate them in the order they were declared. That would probably be the simplest option.

This is a nice idea.

I'm fine with either DSL but agreed that this should happen sooner rather than later and is desirable, thanks @Bo98!

Nov 10 '23 08:11 MikeMcQuaid

Yeah, my idea doesn't really make much sense since it doesn't address passing of information from the first strategy to the second one which is needed here.

Are we sure that this only requires a new DSL or would it also mean changing how livecheck works internally as well? It now seems to me like something that's not worth the effort. Maybe just add a linter check for the CaskLoader.load(__FILE__) thing and revisit this if this pattern starts getting used in more casks in the future?

Nov 23 '23 07:11 apainintheneck

To elaborate a little more we can't really have a separate or chained method because the matches from the first strategy block are often needed to build the URL for the second strategy block. It might be possible to have a nested strategy method though which could do the trick. It would essentially work like strategy.find_versions(*).fetch(:match) and we could call it strategy_matches.

Using the ibabel cask as an example.

Before

livecheck do
  url :homepage
  regex(%r{href=.*?/uploads/(\d+)/(\d+)/iBabel\.zip}i)
  strategy :page_match do |page, regex|
    match = page.match(regex)

    cask = CaskLoader.load(__FILE__)
    download_url = "https://macinchem.org/wp-content/uploads/#{match[1]}/#{match[2]}/iBabel.zip"
    app_version = Homebrew::Livecheck::Strategy::ExtractPlist.find_versions(cask: cask,
                                                                            url:  download_url)[:matches].values.max
    next if app_version.blank?

    "#{app_version.to_s.split(",").first},#{match[1]},#{match[2]}"
  end
end

After

livecheck do
  url :homepage
  regex(%r{href=.*?/uploads/(\d+)/(\d+)/iBabel\.zip}i)
  strategy :page_match do |page, regex|
    match = page.match(regex)

    download_url = "https://macinchem.org/wp-content/uploads/#{match[1]}/#{match[2]}/iBabel.zip"
    app_version = strategy_matches(:extract_plist, url: download_url).values.max
    next if app_version.blank?

    "#{app_version.to_s.split(",").first},#{match[1]},#{match[2]}"
  end
end

The uses of Strategy.page_content could then be replaced with a nested :page_match block and the uses of Strategy.page_headers could be replaced with a nested :header_match block. Still not sure if it's really worth the added complexity though.

Nov 23 '23 08:11 apainintheneck

Yeah, my idea doesn't really make much sense since it doesn't address passing of information from the first strategy to the second one which is needed here.

Not sure I understand the problem you're describing really. Each return value can replace the original URL.

url = (initial url, e.g. stable.url)
return nil if strategies.empty?

strategies.each do |strategy|
  url = run_strategy(url) # internally does its comparisons etc then calls DSL block
end
version = url # last url is version

Seem doable, though I guess it could be argued that it looks "weird" having differing return values.

It might be possible to have a nested strategy method though which could do the trick. It would essentially work like strategy.find_versions(*).fetch(:match) and we could call it strategy_matches.

That's fine too if sufficiently documented and it works for all the cases described. Might look better too but I've not checked all the scenarios.

Nov 23 '23 12:11 Bo98

Coming to this a little late, so pardon the information dump.

I went through Homebrew::Livecheck::Strategy references in strategy blocks and, after refactoring a few, we're currently at 21 uses of #page_content, 3 uses of #page_headers, and the 4 ExtractPlist#find_versions instances.

Use of #page_content and #page_headers in strategy blocks is simply for making an additional HTTP request, so we're not using another strategy in 24 out of 28. #page_content and #page_headers aren't likely to notably change in the future (if they do, additional parameters will be optional), so it may be fine to just make those public.

I have some stashed work to allow us to call #page_content and #page_header in livecheck blocks without the preceding module name (Homebrew::Livecheck::Strategy). The basic idea is to delegate references to page_content and page_headers in the livecheck block DSL to Homebrew::Livecheck::Strategy. If we made those references public, would that help address that part of this issue?

I wasn't previously aware of the four casks with strategy blocks calling ExtractPlist#find_versions but I agree that's not great. Three of those four casks follow a straightforward pattern of fetching a page, extracting information from that page, and using it to generate the URL to use with ExtractPlist#find_versions.

The other cask (ibabel) fetches a page, extracts information from that page, and then uses it in both the archive URL and the version string that the strategy block returns. In this scenario, it would be necessary to pass multiple variables to the next strategy block (the url, and some date strings). As @apainintheneck noticed above, simply passing a url string won't be sufficient.

It's technically possible to support successive strategy blocks if we used a hash as the return type. Mapping the values to #find_versions args is straightforward but passing additional values to the next strategy block is more challenging. strategy blocks expect specific arguments (mostly implicitly), so they would need to be modified to handle variable arguments in a more explicit manner (more like what Sparkle does). Namely, the second strategy block argument is an optional regex, so being able to pass another argument when a regex isn't passed would require a different approach.

Even if it's technically feasible, the mental model of multiple strategy blocks isn't the easiest to understand, so I'm not sure that particular approach would be worth the notable implementation hurdles.

A public method like the #strategy_matches example above that can be used to call a strategy's #find_versions method should be fairly easy to implement and simple to work with in #strategy blocks by comparison. ExtractPlist aside, it's simple/terse enough that it may be fine to replace some of the use of #page_content/#page_headers in strategy blocks but there may still be some instances where we need those methods (something to research).

If that's an agreeable solution, I'm happy to work on an implementation. I'm not sure that we can avoid using CaskLoader#load in the strategy block (due to the context) but I'll think on it.

Nov 23 '23 16:11 samford

CaskLoader#load is only needed because we don't have access to the cask in the strategy block scope, right? I wonder if we could evaluate that block in a module or class scope that has that information already.

Nov 23 '23 18:11 apainintheneck

CaskLoader#load is only needed because we don't have access to the cask in the strategy block scope, right? I wonder if we could evaluate that block in a module or class scope that has that information already.

That's correct. At the moment, the issue is that ExtractPlist#find_versions is called in a strategy block, which doesn't have access to the Cask object that ExtractPlist#find_versions requires. When a url is provided to that method, a copy of the Cask object is modified to use the provided URL.

strategy blocks are typically executed in a strategy's #versions_from_* method. In this case, ExtractPlist#versions_from_items.

We could pass cask from #find_versions into #versions_from_items and rework ExtractPlist strategy block argument parsing to be able to handle a cask argument (maybe in place of items?). Thinking about it some more, that may be our only option. Something like #strategy_matches would be called in a strategy block, so it's the same issue as above (i.e., we would need to pass a Cask object into that method to then pass it to ExtractPlist#find_versions).

The tricky bit is that we would have to use strategy :extract_plist to make that work (these currently use strategy :page_match, to handle the first request) and we would have to sort out a way to avoid unnecessarily executing the strategy twice (e.g., special-casing a strategy block with a cask argument). Something to think about once I dig into it more and start experimenting.

Nov 23 '23 18:11 samford

CaskLoader#load is only needed because we don't have access to the cask in the strategy block scope, right? I wonder if we could evaluate that block in a module or class scope that has that information already.

This seems to make a lot more sense than trying to reload the cask inside blocks that live inside a cask (which I think is a smell).

I've personally found it very surprising the few times I discovered (then forgot, and then re-discovered) that livecheck blocks (or, at least, blocks inside livecheck blocks) have a scope that's different from the formula.

Nov 25 '23 03:11 carlocab

The only reason we need to access the cask within the cask is because of the current API. With a better public API like strategy_matches then all the current use cases disappear. I still don't think there should be any reference to Homebrew::Livecheck::... internals.

If there's another use case separate from that though, then it makes sense to maybe change what self points to (in addition to the other changes).

Nov 25 '23 03:11 Bo98

For service blocks we already make the formula available to allow us to get path information from it (https://github.com/Homebrew/brew/pull/13705). In this case, we don't even need to expose it as a method but just have it exist in the same scope so that we can use it in the internal strategy_matches method.

Nov 25 '23 19:11 apainintheneck

For service blocks we already make the formula available to allow us to get path information from it (#13705). In this case, we don't even need to expose it as a method but just have it exist in the same scope so that we can use it in the internal strategy_matches method.

This seems like a good model to emulate here 👍🏻

Nov 27 '23 12:11 MikeMcQuaid