Allow setting referrer in download request
Thanks for the tool, it's pretty useful. A nice addition would be the ability to set the referrer (and perhaps other variables, like user-agent) in the http request that's sent to download a particular site. Some sites don't function correctly without a correct referrer.
I'm pretty sure this just needs an additional line here that sets the referrer. I can try to do this and submit a pull request, but I'm pretty new to scala and I might handle things the wrong way (i.e., I haven't used implicits much, and this uses them pretty heavily, so I'm not that confident in my ability to do this right).
That is indeed be a good addition which adds much needed configurability. It's been a while since I've written this code and after reading the code i think i overused implicits a bit to much and added unneeded complexity. So a solution with implicits is not necessarily the "right" solution.
Moving the jsoup configuration to an overridable method should be enough.
class WebsiteScraper extends Scraper {
def download(jsoup: org.jsoup.helper.HttpConnection) = jsoup
.userAgent("Mozilla")
.followRedirects(true)
.timeout(0)
def downloadPage(pageUrl: String) = Future {
new WebPage(new URL(pageUrl)) {
doc = download(Jsoup.connect(pageUrl)).get
}
}
}
which can then be overridden
class CustomWebsiteScraper extends WebsiteScraper {
override def download(jsoup: org.jsoup.helper.HttpConnection) = jsoup
.userAgent("Mozilla")
.followRedirects(true)
.referrer("Referrer")
.timeout(0)
}
and then used in a spider
new Spider {
override implicit val scraper = new CustomWebsiteScraper
onReceivedPage ::= { page: WebPage =>
// Page received
}
}.start()
This is just a suggestion and i would love to hear your ideas.