urlwatch icon indicating copy to clipboard operation
urlwatch copied to clipboard

Option to disable 301 Moved Permanently

Open nbeaver opened this issue 6 years ago • 5 comments

Sometimes site return spurious or misguided 301 redirects. Would it be possible to make this configurable on a per-URL basis?

nbeaver avatar Jun 26 '18 01:06 nbeaver

Do you have an example for this? What exactly does the response look like (returns 301 I guess) and what do you expect urlwatch to do in these cases?

kbabioch avatar Jul 11 '18 14:07 kbabioch

Sure, here's an example:

$ wget --spider 'http://www.onlamp.com/pub/a/onlamp/2005/12/15/organizing_files.html'
Spider mode enabled. Check if remote file exists.
--2018-07-19 08:28:06--  http://www.onlamp.com/pub/a/onlamp/2005/12/15/organizing_files.html
Resolving www.onlamp.com (www.onlamp.com)... 207.229.143.153, 207.229.143.147
Connecting to www.onlamp.com (www.onlamp.com)|207.229.143.153|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://www.oreilly.com/ideas [following]
Spider mode enabled. Check if remote file exists.
--2018-07-19 08:28:06--  https://www.oreilly.com/ideas
Resolving www.oreilly.com (www.oreilly.com)... 23.63.208.77
Connecting to www.oreilly.com (www.oreilly.com)|23.63.208.77|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 0 [text/html]
Remote file exists and could contain further links,
but recursion is disabled -- not retrieving.

The URL re-directs here:

https://www.oreilly.com/ideas

This page changes all the time, but the re-direct has not changed. If I could disable 301 re-directs, urlwatch would only notify me if the page actually comes back online.

In an ideal world, I would also be interested in knowing when the URL it actually resolves to changes (the URL, not the HTML), but that would probably be more involved and should probably go in a different feature request.

nbeaver avatar Jul 19 '18 13:07 nbeaver

But do you want to be notified when the Location changes, or do you only want to monitor the status code (e.g. 301)?

kbabioch avatar Jul 19 '18 13:07 kbabioch

I can think of three different possible behaviors for a URL returning 301:

  1. urlwatch follows the redirect and alerts when the HTML changes (current behavior)

  2. urlwatch follows the redirect but ignores the HTML and alerts when the final URL changes

  3. urlwatch ignores the redirect, but when the response code changes to something other than 301 it alerts if the HTML has changed

This is a feature request for an option to enable the third behavior on a per-URL basis, not a request for this to be the default behavior.

I also think there are circumstances when the second behavior would be more desirable than the first or third behavior, and I would be happy to open another feature request for an option to specify this behavior instead. However, I also think there are circumstances when the third behavior is more desirable than the second behavior.

nbeaver avatar Jul 20 '18 04:07 nbeaver

I'm running into a similar challenge and was wondering if there has been any progress made on this feature request.

marcfon avatar May 12 '20 14:05 marcfon