scrape-it
scrape-it copied to clipboard
HTTP errors are not treated as errors
If a URL gives an error response, such as a 404 or a 502, the Promise returned by the scrapeIt function does not reject, but instead resolves and calls its .then chain, and passes a basically-empty object as the data parameter. For example, this code prints "success", even though the URL 404s:
const scrapeIt = require("scrape-it");
scrapeIt("http://google.com/404.html", {}).then(({data, response}) => {
console.log("success");
}).catch(() => {
console.log("error");
});
When hitting the URL gives a HTTP status code of something not in the 200s, I feel it should automatically reject the promise so client catches will run, instead of the then chain.
Yes, this approach has downsides—I remember I chose it for simplicity, I can see how it can break things. However, I guess people should be able to scrape error pages too (maybe they really want to do that).
We can add an option to use the behaviour you expect by default. 🚀 Contributions are welcome!
Is anybody else working on this? I would like to try my hand at this issue
@cukejianya Doesn't seem like anyone is so go for it!
In 6.x.x HTTP errors will eventually throw, as long axios does that.