git-scm.com
git-scm.com copied to clipboard
Find a way to prevent broken book images
As reported on #1390 , book images get broken from time to time.
I see 2 possible solutions:
- download book images to this repo (we may need to update them from time to time?)
- actively check for broken images (cron rake task to check links are not broken?)
Took a stab at this one in pull request #1392. A simple rake task that will parse each img
element from _ext_books.erb
and if the URL cannot be retrieved, a warning is printed to the console. A cron job could be setup to execute this rake task
One problem with checking via rake task is that these are all run via Heroku's scheduler, which doesn't provide any mechanism for feedback (i.e., to tell us that some links are broken). After searching a bit, the usual proposed solution seems to be using the exception-notifier gem to send an email. But that implies configuring the site to send email, which is notoriously annoying.
I wonder if we could instead have it open an issue in this repository. That would require setting up a bot GitHub account, but once done, it would be pretty reliable.
Just downloading the images into the repo does side-step all of this, but it might be nice to get to the root of the problem. In particular:
-
we could be doing a full-on link check, not just checking for broken image links
-
we currently get no feedback from the existing cron jobs (e.g., to build the book or manpages)
I was addressing the case of a successful request with a non-image body and ran into some URLs returning a 301. We could follow the eventual location from the response header, but then there is the potential risk of infinite redirects. Adding all that logic seems a bit overkill for this solution, seeing as like @peff mentioned, this isn't a likely long term solution.
It might make more sense here to punt and take the low-tech solution of downloading the images to the repo and updating the src
attributes. A more automated is definitely preferable, but takes more design as @peff suggests.
I think any kind of link-checking will have to handle redirects at some point. It would be OK to just put a hard limit on looping (say, 10). I do wonder if there's a higher-level gem than net/http
we could use that would do this sort of thing for us (likewise the https
handling, which I notice you had to do manually).
I would like to tackle this as my "good first issue"
All cover images for Packt books are broken, as they lead to the images in (long expired) cache.
For example, the cover image for "Mastering Git" book is now
https://www.packtpub.com/media/catalog/product/cache/e4d64343b1bc593f1c5348fe05efa4a6/b/0/b03537_cover.png
instead of
https://static.packt-cdn.com/products/9781783553754/cover/smaller
added https://github.com/git/git-scm.com/pull/1566 as a quick fix. but maybe we should just keep the images in the repo as we have with a few others (https://github.com/git/git-scm.com/tree/master/public/images/books)
Thanks for jumping on this, @pedrorijo91. I'm not opposed to importing book images. They might get out of date with the source images, but if the publishers/authors care, they can open a PR. I'd rather have slightly old images than broken ones.
But I'm also OK to punt on that to see if this actually happens again. It looks like our packt links were probably not the right ones in the first place here.
As reported on #1390 , book images get broken from time to time.
I see 2 possible solutions:
- download book images to this repo (we may need to update them from time to time?)
- actively check for broken images (cron rake task to check links are not broken?)