spidr icon indicating copy to clipboard operation
spidr copied to clipboard

SSL session reuse may fail

Open nirvdrum opened this issue 12 years ago • 1 comments

I've just run into a situation where the reuse of an SSL session caused an exception and Spidr subsequently skipped the page. Currently, the exception is silently swallowed, so I modified it to grab the following trace:

EOFError (end of file reached):
  /home/nirvdrum/.rvm/rubies/ruby-1.9.3-p0/lib/ruby/1.9.1/openssl/buffering.rb:174:in `sysread_nonblock'
  /home/nirvdrum/.rvm/rubies/ruby-1.9.3-p0/lib/ruby/1.9.1/openssl/buffering.rb:174:in `read_nonblock'
  /home/nirvdrum/.rvm/rubies/ruby-1.9.3-p0/lib/ruby/1.9.1/net/protocol.rb:141:in `rbuf_fill'
  /home/nirvdrum/.rvm/rubies/ruby-1.9.3-p0/lib/ruby/1.9.1/net/protocol.rb:122:in `readuntil'
  /home/nirvdrum/.rvm/rubies/ruby-1.9.3-p0/lib/ruby/1.9.1/net/protocol.rb:132:in `readline'
  /home/nirvdrum/.rvm/rubies/ruby-1.9.3-p0/lib/ruby/1.9.1/net/http.rb:2562:in `read_status_line'
  /home/nirvdrum/.rvm/rubies/ruby-1.9.3-p0/lib/ruby/1.9.1/net/http.rb:2551:in `read_new'
  /home/nirvdrum/.rvm/rubies/ruby-1.9.3-p0/lib/ruby/1.9.1/net/http.rb:1319:in `block in transport_request'
  /home/nirvdrum/.rvm/rubies/ruby-1.9.3-p0/lib/ruby/1.9.1/net/http.rb:1316:in `catch'
  /home/nirvdrum/.rvm/rubies/ruby-1.9.3-p0/lib/ruby/1.9.1/net/http.rb:1316:in `transport_request'
  /home/nirvdrum/.rvm/rubies/ruby-1.9.3-p0/lib/ruby/1.9.1/net/http.rb:1293:in `request'
  rest-client (1.6.7) lib/restclient/net_http_ext.rb:51:in `request'
  /home/nirvdrum/.rvm/rubies/ruby-1.9.3-p0/lib/ruby/1.9.1/net/http.rb:1026:in `get'
  spidr (0.4.1) lib/spidr/agent.rb:513:in `block in get_page'
  spidr (0.4.1) lib/spidr/agent.rb:684:in `prepare_request'
  spidr (0.4.1) lib/spidr/agent.rb:512:in `get_page'
  app/models/cookie_login_option.rb:150:in `fetch_remote_form'
  app/models/cookie_login_option.rb:158:in `block in fetch_remote_form'
  spidr (0.4.1) lib/spidr/agent.rb:518:in `block in get_page'
  spidr (0.4.1) lib/spidr/agent.rb:684:in `prepare_request'
  spidr (0.4.1) lib/spidr/agent.rb:512:in `get_page'
  app/models/cookie_login_option.rb:150:in `fetch_remote_form'
  app/models/cookie_login_option.rb:158:in `block in fetch_remote_form'
  spidr (0.4.1) lib/spidr/agent.rb:518:in `block in get_page'
  spidr (0.4.1) lib/spidr/agent.rb:684:in `prepare_request'
  spidr (0.4.1) lib/spidr/agent.rb:512:in `get_page'

If I modify the code to remove the session cache, I am able to fetch the page okay. It might be good to catch EOFError and retry with a new session in the event this happens. Catching the error all over the place could be messy though.

nirvdrum avatar Jan 19 '12 01:01 nirvdrum

Could this be a version issue? I had something like this happen to me with a simple spider that printed the urls from a site. Using ree it would fail, while with 2.0.0 is would work fine.

a-yiorgos avatar Feb 23 '14 09:02 a-yiorgos