broken-link-checker icon indicating copy to clipboard operation
broken-link-checker copied to clipboard

Add custom HTTP headers option

Open miguelcalderon opened this issue 7 years ago • 5 comments
trafficstars

I think it could be a helpful feature, specially for URLs that behave differently depending on headers content (I'm thinking about LinkedIn at this moment, but not only).

If you this feature would be interesting, I can create a pull request.

miguelcalderon avatar Apr 12 '18 13:04 miguelcalderon

I also have this requirement, because some servers require certain headers which are reliably sent by browsers (Accept in the case of the one I'm struggling with at the moment) and may respond with a 400 or 404 if those headers aren't present.

I also ran into the LinkedIn issue @miguelcalderon refers to, where it seems to respond to BLC with a non-standard HTTP 999 status.

triblondon avatar Jun 16 '20 15:06 triblondon

I guess that custom headers should be applied to all requests to URLs with the base url of siteURL?

stevenvachon avatar Jun 16 '20 15:06 stevenvachon

I would have said apply it to all requests regardless... but I am not totally sure I grok what siteUrl does. I use BLC via https://github.com/LukasHechenberger/broken-link-checker-local.

After some digging with the site that was failing for me (crates.io), I've realised that it actually returns a 200 OK for every request, regardless of path, and then 'boots' an Ember application which displays a 'not found' error if the path can't be routed. Wonderful 🤦 . But I think custom headers would address the restrictions imposed by LinkedIn, and probably others.

triblondon avatar Jun 16 '20 16:06 triblondon

I think that sending headers to all URLs will create unexpected issues.

URLs with the base URL of siteURL

means "everything relative to the input URL". For example, blc https://google.com would only have custom headers for https://google.com*

Perhaps a config file (#123) is necessary for complex options such as this:

{
  "headers": [{
    "urls": ["http?(s)://?(*.)google.com/**/*"], // globs
    "properties": {
      "key": "value"
    }
  }]
}

stevenvachon avatar Jun 16 '20 17:06 stevenvachon

Yeah this would be a great feature.

A method in the config object might be better than a complex configuration file.

new SiteChecker(options, {
  headers (request) {
    if (/foo/.test(request.url))
      request.headers.some-header = 'special value'
  }
})

Although it looks as though this syntax of passing handlers in as the second argument in new SiteChecker(options, handlers) (as shown on npmjs) has been deprecated? the readme here on github has event based handlers, which couldn't be used to update the request like this.

leviwheatcroft avatar Apr 16 '21 12:04 leviwheatcroft