broken-link-checker icon indicating copy to clipboard operation
broken-link-checker copied to clipboard

302 redirect reported as 404 broken link

Open kaiye opened this issue 9 years ago • 8 comments

I check the url http://store.meizu.com/ by HtmlUrlChecker, it report the broken link http://ordercenter.meizu.com/list/index.html with a HTTP_404 brokenReason, but it's a redirect link, not a 404.

Here is the result param in the HtmlUrlChecker link callback function.

{ url:
   { original: 'http://ordercenter.meizu.com/list/index.html',
     resolved: 'http://ordercenter.meizu.com/list/index.html',
     redirected: 'https://login.flyme.cn/vCodeLogin?useruri=http%3A%2F%2Fstore.meizu.com%2Fmember%2Flogin.htm?useruri=http://ordercenter.meizu.com/list/index.html&sid=unionlogin&service=&autodirct=true' },
  base:
   { original: 'http://store.meizu.com/',
     resolved: 'http://store.meizu.com/' },
  html:
   { index: 11,
     offsetIndex: 9,
     location: { line: 51, col: 44, startOffset: 2842, endOffset: 2893 },
     selector: 'html > body > div:nth-child(1) > div:nth-child(1) > div:nth-child(2) > ul:nth-child(1) > li:nth-child(2) > a:nth-child(1)',
     tagName: 'a',
     attrName: 'href',
     attrs:
      { class: 'topbar-link',
        href: 'http://ordercenter.meizu.com/list/index.html',
        target: '_blank' },
     text: '我的订单',
     tag: '<a class="topbar-link" href="http://ordercenter.meizu.com/list/index.html" target="_blank">' },
  http:
   { cached: false,
     response:
      { headers: [Object],
        httpVersion: '1.1',
        statusCode: 404,
        statusMessage: 'Not Found',
        url: 'https://login.flyme.cn/vCodeLogin?useruri=http%3A%2F%2Fstore.meizu.com%2Fmember%2Flogin.htm?useruri=http://ordercenter.meizu.com/list/index.html&sid=unionlogin&service=&autodirct=true',
        redirects: [Object] } },
  broken: true,
  internal: false,
  samePage: false,
  excluded: false,
  brokenReason: 'HTTP_404',
  excludedReason: null }

kaiye avatar Aug 29 '16 10:08 kaiye

This is also happening to me. An example is a shortened URL from goo.gl...

luisfbmelo avatar Oct 20 '16 20:10 luisfbmelo

Please show me your link.http.response.redirects Array.

stevenvachon avatar Oct 20 '16 21:10 stevenvachon

As the response url, it returns the correct final destination after resolving the shortened URL, but the statusCode is 404. The http.response.redirects returns the following:

[ 
 { 
  headers: 
  {
    'content-type': 'text/html;  charset=UTF-8', 
    'cache-control': 'no-cache, no-store, max-age=0, must-revalidate', 
    pragma: 'no-cache', 
    expires: 'Mon, 01 Jan 1990 00:00:00 GMT', 
    date: 'Fri, 21 Oct 2016 09:09:23 GMT', 
    location: 'http://www.pordata.pt/Portugal/Res%C3%ADduos+urbanos+de+recolha+indiferenciada+e+selectiva-1104', 
    'x-content-type-options': 'nosniff', 
    'x-frame-options': 'SAMEORIGIN', 
    'x-xss-protection': '1; mode=block', 
    server: 'GSE', 
    'accept-ranges': 'none', 
    vary: 'Accept-Encoding', 
    connection: 'close' 
  }, 
  httpVersion: '1.1', 
  statusCode: 301, 
  statusMessage: 'Moved Permanently', 
  url: 'http://goo.gl/q0bKDz' 
 }
]

I also have a CSV file with the result of detected broken links, but most of them work if I try to access via browser... https://drive.google.com/open?id=0Bw4-qyfztV_PNG9QSnBhRW1TRjQ

luisfbmelo avatar Oct 21 '16 09:10 luisfbmelo

I have the same problem here. The 302 will reported as 404 regardless it redirect successfully or not.

howar31 avatar Dec 05 '16 06:12 howar31

Try installing the v0.8.0 (pre-release) branch:

npm install https://github.com/stevenvachon/broken-link-checker#v0.8.0

A lot has changed and it may have been fixed (?).

stevenvachon avatar Dec 06 '16 05:12 stevenvachon

Got error right after "Getting links from" in v0.8.0 :/

(node:18433) UnhandledPromiseRejectionWarning: Unhandled promise rejection (rejection id: 1): TypeError: Cannot read property 'length' of undefined
(node:18433) DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.

This is my testing webpage:

<!doctype html>

<html lang="en">
<head>
	<meta charset="utf-8">

	<title>Broken Link Test</title>
	<meta name="description" content="Broken Link Test">
	<meta name="author" content="Howar31">

	<link rel="stylesheet" href="index.css">

	<!--[if lt IE 9]>
	<script src="https://cdnjs.cloudflare.com/ajax/libs/html5shiv/3.7.3/html5shiv.js"></script>
	<![endif]-->
</head>

<body>
	<h1>Broken Link Test</h1>
	<a href="http://google.com">Google</a>
	<a href="http://howar31.com">Howar31</a>
	<script src="index.js"></script>
</body>
</html>

BLC command:

blc http://mywebsite.url/brokenlink/ -ro --fetch-level 3 

Enviornment:

howar31@ubuntu:~$ node -v
v7.2.0
howar31@ubuntu:~$ npm -v
3.10.9
howar31@ubuntu:~$ blc -V
0.8.0

howar31 avatar Dec 08 '16 02:12 howar31

@howar31 sorry about that. I'd pushed my incomplete changes for file:// which had an error in it. It should work now.

If it doesn't work, you could try the previous commit without those changes:

npm install https://github.com/stevenvachon/broken-link-checker#5bc1ebc757035ffe7d470cd37b56edb0cb2a1f5b

If that doesn't work either, then using v0.8.0 will have to wait for now.

stevenvachon avatar Dec 17 '16 17:12 stevenvachon

I'm getting this error too: Redirects reported as 404s.

zeke avatar Feb 02 '19 06:02 zeke