git-scm.com icon indicating copy to clipboard operation
git-scm.com copied to clipboard

Find a way to prevent broken book images

Open pedrorijo91 opened this issue 5 years ago • 10 comments

As reported on #1390 , book images get broken from time to time.

I see 2 possible solutions:

  • download book images to this repo (we may need to update them from time to time?)
  • actively check for broken images (cron rake task to check links are not broken?)

pedrorijo91 avatar Oct 30 '19 10:10 pedrorijo91

Took a stab at this one in pull request #1392. A simple rake task that will parse each img element from _ext_books.erb and if the URL cannot be retrieved, a warning is printed to the console. A cron job could be setup to execute this rake task

smithmf avatar Oct 30 '19 18:10 smithmf

One problem with checking via rake task is that these are all run via Heroku's scheduler, which doesn't provide any mechanism for feedback (i.e., to tell us that some links are broken). After searching a bit, the usual proposed solution seems to be using the exception-notifier gem to send an email. But that implies configuring the site to send email, which is notoriously annoying.

I wonder if we could instead have it open an issue in this repository. That would require setting up a bot GitHub account, but once done, it would be pretty reliable.

Just downloading the images into the repo does side-step all of this, but it might be nice to get to the root of the problem. In particular:

  • we could be doing a full-on link check, not just checking for broken image links

  • we currently get no feedback from the existing cron jobs (e.g., to build the book or manpages)

peff avatar Oct 30 '19 18:10 peff

I was addressing the case of a successful request with a non-image body and ran into some URLs returning a 301. We could follow the eventual location from the response header, but then there is the potential risk of infinite redirects. Adding all that logic seems a bit overkill for this solution, seeing as like @peff mentioned, this isn't a likely long term solution.

It might make more sense here to punt and take the low-tech solution of downloading the images to the repo and updating the src attributes. A more automated is definitely preferable, but takes more design as @peff suggests.

smithmf avatar Oct 30 '19 20:10 smithmf

I think any kind of link-checking will have to handle redirects at some point. It would be OK to just put a hard limit on looping (say, 10). I do wonder if there's a higher-level gem than net/http we could use that would do this sort of thing for us (likewise the https handling, which I notice you had to do manually).

peff avatar Oct 30 '19 21:10 peff

I would like to tackle this as my "good first issue"

JeremiahKamama avatar Feb 16 '20 13:02 JeremiahKamama

All cover images for Packt books are broken, as they lead to the images in (long expired) cache.

For example, the cover image for "Mastering Git" book is now

https://www.packtpub.com/media/catalog/product/cache/e4d64343b1bc593f1c5348fe05efa4a6/b/0/b03537_cover.png

"Mastering Git" cover

instead of

https://static.packt-cdn.com/products/9781783553754/cover/smaller

"Mastering Git" cover

jnareb avatar Jan 09 '21 01:01 jnareb

added https://github.com/git/git-scm.com/pull/1566 as a quick fix. but maybe we should just keep the images in the repo as we have with a few others (https://github.com/git/git-scm.com/tree/master/public/images/books)

pedrorijo91 avatar Jan 09 '21 13:01 pedrorijo91

Thanks for jumping on this, @pedrorijo91. I'm not opposed to importing book images. They might get out of date with the source images, but if the publishers/authors care, they can open a PR. I'd rather have slightly old images than broken ones.

But I'm also OK to punt on that to see if this actually happens again. It looks like our packt links were probably not the right ones in the first place here.

peff avatar Jan 11 '21 06:01 peff

As reported on #1390 , book images get broken from time to time.

I see 2 possible solutions:

  • download book images to this repo (we may need to update them from time to time?)
  • actively check for broken images (cron rake task to check links are not broken?)

js1darren avatar Aug 25 '23 14:08 js1darren