anemone
anemone copied to clipboard
HTTP request header support
Hi,
I needed to add some additional HTTP request headers and didn't see any support for that currently. Happy to change anything / add more specs to cover the change.
Ash.
I agree that this should be in there... please accept this so everyone can benefit! :-)
:+1:
@ashmckenzie would you please submit a new PR in the new fork called Medusa?
Instead of using Net::HTTP
uses OpenURI
, which means that the headers should be passed in the options
argument as seen in the http.rb. Just make sure that the keys are Strings. See OpenURI option docs.
options must be a hash. Each option with a string key specifies an extra header field for HTTP. I.e., it is ignored for FTP without HTTP proxy. The hash may include other options, where keys are symbols
:+1: Thank you! I was allowed used in crawling require Basic authentication page .
Sample:
require 'anemone'
require 'base64'
url = "http://exsample.com/test.htm"
auth_base64 = Base64.encode64('USER:PASSWORD').gsub(/\n/, "")
headers = {"Authorization" => "Basic #{auth_base64}"}
Anemone.crawl(url, {:http_request_headers => headers}) do |anemone|
anemone.on_every_page do |page|
puts page.url
puts page.body.toutf8
end
end
@chriskite Can we merge this?
The need is not . It is just information .
@atgs-ghayakawa @paresharma could u help adapt this PR to a new PR in the new fork called Medusa?
@brutuscat Hi, I forked your fork so that I could do a PR with these changes. But, I see that you have already added BAA support: https://github.com/paresharma/medusa/blob/master/lib/medusa/http.rb#L81-L83
I can just use Medusa in place of Anemone with BAA. :+1:
Medusa.crawl(url, { http_basic_authentication: [username, password] }) do |medusa|
medusa.on_every_page do |page|
puts page.code
end
end
Now, that I am at it I'll do a general clean up and make it compatible with Ruby 2.2 (which would mean dropping support for Kyoto and Tokyo, I guess). Will do a PR when it's done. :smile: