hjuutilainen-recipes icon indicating copy to clipboard operation
hjuutilainen-recipes copied to clipboard

URLTextSearcher failing in LibreOffice recipe

Open mtconleyuk opened this issue 6 years ago • 6 comments

This morning I'm getting the following error from AutoPkg when running my LibreOffice recipe:-

No match found on URL: https://www.libreoffice.org/download/libreoffice-fresh/?type=mac-x86_64

The LibreOffice download recipe uses URLTextSearcher with the re_pattern string

?P<DOWNLOAD_URL>download.documentfoundation.org/libreoffice/stable/[\d\.]+/mac/x86_64/LibreOffice_(?P<version>[\d\.]+)_MacOS_x86-64.dmg

I downloaded the page at the url supplied to URLTextSearcher:-

https://www.libreoffice.org/download/libreoffice-fresh/?type=mac-x86_64

and did a search on the text using BBEdit and the essentials from the above regular expression, and the pattern matches several times in the page source, the first match being

download.documentfoundation.org/libreoffice/stable/6.2.0/mac/x86_64/LibreOffice_6.2.0_MacOS_x86-64.dmg

which looks fine to me. Not sure why URLTextSearch is failing.

mtconleyuk avatar Feb 15 '19 14:02 mtconleyuk

The download page now redirects to https://www.libreoffice.org/download/download/ without preserving the ?type-mac-x86_64 query. Visiting without a user agent seems to default to MS Windows x86, i.e. the equivalent of visiting https://www.libreoffice.org/download/download/?type=win-x86&version=6.2.0&lang=en-US.

So, the search URL needs to be changed to https://www.libreoffice.org/download/download/?type=mac-x86_64.

This is addressed in #104 .

eenblam avatar Feb 15 '19 19:02 eenblam

Looks like the text searcher is working. However, I noticed that it is always picking the first match which is at the time of writing release 6.2.2, which is marked as a pre-release.

From looking at the source of https://www.libreoffice.org/download/download/ it looks like 6.2.2 is just the first match for the regex.

aschwanb avatar Mar 21 '19 15:03 aschwanb

Yep, I'm not sure what to do about this. The recipe works for now but it always grabs the "fresh" release. Their current download page doesn't use the fresh/still vocabulary at all to distinguish between the downloads and both of the current versions have stable in their URL.

hjuutilainen avatar Mar 22 '19 07:03 hjuutilainen

Likewise. The only thing I can think of at the moment is writing a custom url provider which takes all the matching versions (three ad the time of writing) and return one of them depending on the argument set in the recipe.

However, that seems like an overly complicated solution.

aschwanb avatar Mar 22 '19 14:03 aschwanb

Christian Lohmaier, a LibreOffice developer, suggested on freenode that this link would be more reliable http://download.documentfoundation.org/libreoffice/stable/

He said the lower version number should always be the still version.

dhmoore avatar Apr 09 '19 13:04 dhmoore

So this problem is cropping up again with version 7.1.4 vs 7.1.3 being listed on the Release Notes. That said it does look like scraping http://download.documentfoundation.org/libreoffice/stable/ could be a valid and easy solution.

https://www.libreoffice.org/download/download/ does only have two versions listed and it would seem that still is always older than fresh. Either could provide a source for both streams.

Or I guess we could consider this the mostly_fresh stream since there's the delay between release and being posted on the Release Notes.

PeetMcK avatar Jun 11 '21 21:06 PeetMcK