webmock icon indicating copy to clipboard operation
webmock copied to clipboard

webmock failures inside sidekiq workers not bubbled up to rspec

Open thoraxe opened this issue 4 years ago • 9 comments

Using webmock 3.8.3 with ruby 2.6.6 and rails 5.2.4

This is related to #897 now.

When I enable and then disable, I see the webmock failures in the logs, but they are not exposed up to rspec.

In other words, the request fails properly:

error: Real HTTP connections are disabled. Unregistered request: GET.......

But the test is considered passed because the failure inside the worker doesn't bubble up to rspec:

Finished in 0.55596 seconds (files took 4.35 seconds to load)
1 example, 0 failures

I'm not sure if this is expected or not.

I also added the above as the last comment. The rest of this post and my other comments is mostly background details that aren't totally relevant at this point. I also updated the title of the issue to reflect the "real" problem, as it was mostly user error the first threve-teen go throughs.

Test scenario (rspec):

    it "should fetch token vector for article content when content is not nil" do
      require 'webmock/rspec'

      # don't wait for async sidekiq job
      Sidekiq::Testing.inline!

      request_url = "#{ENV['TOKENIZER_URL']}/tokenize"

      # webmock the expected request and response
      stub = stub_request(:post, request_url)
             .with(body: 'content=y')
             .to_return(body: 'y')

      # create an attrs to hand to ingest
      attrs = {
        data_source: @data_source,
        title: Faker::Book.title,
        url: Faker::Internet.url,
        content: "y",
        abstract: Faker::Book.genre,
        published_on: DateTime.now,
        created_at: DateTime.now
      }

      # ingest an article from the attrs
      status = Article.ingest(attrs)

      # the ingest occurs roughly simultaneously to the submission to the
      # worker so we need to re-fetch the article by the id because at that
      # point it will have gotten the vector saved to the DB
      @token_vector_article = Article.find(status[1].id)

      # we should have sent a request with content=y
      expect(stub).to have_been_requested

      # we should've saved "y" as the token_vector
      expect(@token_vector_article.token_vector).not_to eq(nil)
      expect(@token_vector_article.token_vector).to eq("y")
    end

Result:

root@c18df30d6d22:/usr/src/app# bundle exec rspec spec/models/article_spec.rb:174
database: seraph_aa_test
Run options: include {:locations=>{"./spec/models/article_spec.rb"=>[174]}}
.2020-06-19T12:34:48.961Z 2693 TID-gso1icmp1 INFO: before connection
2020-06-19T12:34:49.068Z 2693 TID-gso1icmp1 INFO: after connection
F

Failures:                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                              
  1) Article tokenization and vectorization should fetch token vector for article content when content is not nil                                                                                                                             
     Failure/Error: expect(stub).to have_been_requested                                                                                                                                                                                       
                                                                                                                                                                                                                                              
       The request POST https://xxx/tokenize with body "content=y" was expected to execute 1 time but it executed 0 times                                                                                        
                                                                                                                                                                                                                                              
       The following requests were made:                                                       
                            
       No requests were made.                                                                                                                         
       ============================================================
     # ./spec/models/article_spec.rb:213:in `block (3 levels) in <top (required)>'
     # ./spec/rails_helper.rb:49:in `block (3 levels) in <top (required)>'
     # /usr/local/bundle/gems/database_cleaner-1.8.4/lib/database_cleaner/generic/base.rb:16:in `cleaning'
     # /usr/local/bundle/gems/database_cleaner-1.8.4/lib/database_cleaner/configuration.rb:87:in `block (2 levels) in cleaning'
     # /usr/local/bundle/gems/database_cleaner-1.8.4/lib/database_cleaner/configuration.rb:88:in `cleaning'
     # ./spec/rails_helper.rb:48:in `block (2 levels) in <top (required)>'
                                                                                                             
Finished in 32.2 seconds (files took 3.93 seconds to load)                 
2 examples, 1 failure           
                                            
Failed examples:                                                                               
                                          
rspec ./spec/models/article_spec.rb:180 # Article tokenization and vectorization should fetch token vector for article content when content is not nil

You can see that the Sidekiq worker is being executed because its log messages are showing (before connection / after connection)

Here's the code in the worker:

class TokenizeAndVectorizeWorker
  include Sidekiq::Worker
  sidekiq_options queue: 'tokenizer_vectorizer', retry: true, backtrace: true

  def perform(article_id)
    article = Article.find(article_id)
    tokenizer_url = ENV['TOKENIZER_URL']

    if article.content.nil?
      send_content = article.abstract
    else
      send_content = article.content
    end

    # configure Faraday
    conn = Faraday.new(tokenizer_url) do |c|
      c.use Faraday::Response::RaiseError
      c.headers['Content-Type'] = 'application/x-www-form-urlencoded'
    end

    logger.info('before connection')
    # get the response from the tokenizer
    resp = conn.post '/tokenize', "content=#{URI.encode(send_content)}"
    logger.info('after connection')

    # the response's body contains the JSON for the tokenized and vectorized article content
    article.token_vector = resp.body

    article.save
  end
end

You can see where I am making the Faraday call and it is definitely reaching out to the remote system because I get back an expected output from that remote system:

Article Update (1.2ms)  UPDATE "articles" SET "token_vector" = $1, "updated_at" = $2, "data" = $3 WHERE "articles"."id" = $4  [["token_vector", "{\"py/object\": \"scipy.sparse.csr.csr_matrix\", \"_shape\": {\"py/tuple\": [1, 262144]}, \"maxprint\": 50, \"indices\": {\"py/reduce\": [{\"py/function\": \"numpy.core.multiarray._reconstruct\"}, {\"py/tuple\": [{\"py/type\": \"numpy.ndarray\"}, {\"py/tuple\": [0]}, {\"py/b64\": \"Yg==\"}]}, {\"py/tuple\": [1, {\"py/tuple\": [1]}, {\"py/reduce\": [{\"py/type\": \"numpy.dtype\"}, {\"py/tuple\": [\"i4\", 0, 1]}, {\"py/tuple\": [3, \"<\", null, null, null, -1, -1, 0]}]}, false, {\"py/b64\": \"FpIBAA==\"}]}]}, \"indptr\": {\"py/reduce\": [{\"py/function\": \"numpy.core.multiarray._reconstruct\"}, {\"py/tuple\": [{\"py/type\": \"numpy.ndarray\"}, {\"py/tuple\": [0]}, {\"py/b64\": \"Yg==\"}]}, {\"py/tuple\": [1, {\"py/tuple\": [2]}, {\"py/id\": 2}, false, {\"py/b64\": \"AAAAAAEAAAA=\"}]}]}, \"data\": {\"py/reduce\": [{\"py/function\": \"numpy.core.multiarray._reconstruct\"}, {\"py/tuple\": [{\"py/type\": \"numpy.ndarray\"}, {\"py/tuple\": [0]}, {\"py/b64\": \"Yg==\"}]}, {\"py/tuple\": [1, {\"py/tuple\": [1]}, {\"py/reduce\": [{\"py/type\": \"numpy.dtype\"}, {\"py/tuple\": [\"f8\", 0, 1]}, {\"py/tuple\": [3, \"<\", null, null, null, -1, -1, 0]}]}, false, {\"py/b64\": \"AAAAAAAA8D8=\"}]}]}, \"_has_canonical_format\": true, \"_has_sorted_indices\": true}"], ["updated_at", "2020-06-19 12:34:49.078913"], ["data", "{\"tags\":[],\"tag_ids\":[],\"added_tags\":[],\"removed_tags\":[],\"bookmarks\":[]}"], ["id", 11]]

If I put require 'webmock/rspec' at the top of this particular test file or even in my spec_helper, it causes other things in my test to blow up (namely the integration with Elasticsearch and setting up the tests):

# top of spec file
require 'spec_helper'
require 'rails_helper'
require 'sidekiq/testing'
require 'webmock/rspec'

Result:

root@c18df30d6d22:/usr/src/app# bundle exec rspec spec/models/article_spec.rb:174
database: seraph_aa_test
Run options: include {:locations=>{"./spec/models/article_spec.rb"=>[174]}}
There was an error creating the elasticsearch index for Article: #<NameError: uninitialized constant Faraday::Error::ConnectionFailed>
There was an error removing the elasticsearch index for Article: #<NameError: uninitialized constant Faraday::Error::ConnectionFailed>
F

Failures:

  1) Article extract image url article with improper image url
     Failure/Error: Article.import
     
     ArgumentError:
       seraph_aa_application_test does not exist to be imported into. Use create_index! or the :force option to create it.
     # /usr/local/bundle/gems/elasticsearch-model-5.1.0/lib/elasticsearch/model/importing.rb:118:in `import'
     # /usr/local/bundle/gems/elasticsearch-model-5.1.0/lib/elasticsearch/model.rb:118:in `import'
     # ./spec/models/article_spec.rb:18:in `block (2 levels) in <top (required)>'
     # ./spec/rails_helper.rb:49:in `block (3 levels) in <top (required)>'
     # /usr/local/bundle/gems/database_cleaner-1.8.4/lib/database_cleaner/generic/base.rb:16:in `cleaning'
     # /usr/local/bundle/gems/database_cleaner-1.8.4/lib/database_cleaner/configuration.rb:87:in `block (2 levels) in cleaning'
     # /usr/local/bundle/gems/database_cleaner-1.8.4/lib/database_cleaner/configuration.rb:88:in `cleaning'
     # ./spec/rails_helper.rb:48:in `block (2 levels) in <top (required)>'

Finished in 0.50151 seconds (files took 4.75 seconds to load)
1 example, 1 failure

Failed examples:

rspec ./spec/models/article_spec.rb:166 # Article extract image url article with improper image url

It seems to "work" with webmock/rspec loaded in the specific test in the sense that it's clearly showing what webmock expects. However it's not detecting any requests being made, let alone the wrong request.

thoraxe avatar Jun 19 '20 12:06 thoraxe

Related to: https://stackoverflow.com/questions/62461780/how-do-i-test-whether-a-sidekiq-worker-is-sending-the-right-data-to-an-external

thoraxe avatar Jun 19 '20 12:06 thoraxe

I tried adding a WebMock.ebable! to the test per #622 but that just ends up breaking ElasticSearch:

 1) Article tokenization and vectorization should fetch token vector for article content when content is not nil
     Failure/Error: record.__elasticsearch__.__send__ "#{operation}_document"

     NameError:
       uninitialized constant Faraday::Error::ConnectionFailed
     # /usr/local/bundle/gems/elasticsearch-transport-5.0.5/lib/elasticsearch/transport/transport/http/faraday.rb:47:in `host_unreachable_exceptions'                                                                                        
     # /usr/local/bundle/gems/elasticsearch-transport-5.0.5/lib/elasticsearch/transport/transport/base.rb:282:in `rescue in perform_request'                                                                                                 
     # /usr/local/bundle/gems/elasticsearch-transport-5.0.5/lib/elasticsearch/transport/transport/base.rb:252:in `perform_request'                                                                                                           
     # /usr/local/bundle/gems/elasticsearch-transport-5.0.5/lib/elasticsearch/transport/transport/http/faraday.rb:20:in `perform_request'                                                                                                    
     # /usr/local/bundle/gems/elasticsearch-transport-5.0.5/lib/elasticsearch/transport/client.rb:131:in `perform_request'                                                                                                                   
     # /usr/local/bundle/gems/elasticsearch-api-5.0.5/lib/elasticsearch/api/actions/index.rb:100:in `index'
     # /usr/local/bundle/gems/elasticsearch-model-5.1.0/lib/elasticsearch/model/indexing.rb:346:in `index_document'
     # ./app/workers/indexer.rb:27:in `perform'
     # /usr/local/bundle/gems/sidekiq-5.2.8/lib/sidekiq/testing.rb:301:in `execute_job'
     # /usr/local/bundle/gems/sidekiq-5.2.8/lib/sidekiq/testing.rb:296:in `block in process_job'
     # /usr/local/bundle/gems/sidekiq-5.2.8/lib/sidekiq/middleware/chain.rb:128:in `block in invoke'
     # /usr/local/bundle/gems/sidekiq-5.2.8/lib/sidekiq/middleware/chain.rb:133:in `invoke'
     # /usr/local/bundle/gems/sidekiq-5.2.8/lib/sidekiq/testing.rb:295:in `process_job'
     # /usr/local/bundle/gems/sidekiq-5.2.8/lib/sidekiq/testing.rb:89:in `block in raw_push'                                                                                                                                                 
     # /usr/local/bundle/gems/sidekiq-5.2.8/lib/sidekiq/testing.rb:85:in `each'
     # /usr/local/bundle/gems/sidekiq-5.2.8/lib/sidekiq/testing.rb:85:in `raw_push'
     # /usr/local/bundle/gems/sidekiq-5.2.8/lib/sidekiq/client.rb:74:in `push'
     # /usr/local/bundle/gems/sidekiq-5.2.8/lib/sidekiq/worker.rb:149:in `client_push'
     # /usr/local/bundle/gems/sidekiq-5.2.8/lib/sidekiq/worker.rb:93:in `perform_async'
     # ./app/models/concerns/searchable.rb:74:in `block (2 levels) in <module:Searchable>'
     # /usr/local/bundle/gems/airbrake-6.3.0/lib/airbrake/rails/active_record.rb:27:in `run_callbacks'
     # ./app/models/article.rb:306:in `ingest'
     # ./spec/models/article_spec.rb:205:in `block (3 levels) in <top (required)>'
     # ./spec/rails_helper.rb:49:in `block (3 levels) in <top (required)>'
     # /usr/local/bundle/gems/database_cleaner-1.8.4/lib/database_cleaner/generic/base.rb:16:in `cleaning'
     # /usr/local/bundle/gems/database_cleaner-1.8.4/lib/database_cleaner/configuration.rb:87:in `block (2 levels) in cleaning'
     # /usr/local/bundle/gems/database_cleaner-1.8.4/lib/database_cleaner/configuration.rb:88:in `cleaning'
     # ./spec/rails_helper.rb:48:in `block (2 levels) in <top (required)>'
     # ------------------
     # --- Caused by: ---
     # WebMock::NetConnectNotAllowedError:
     #   Real HTTP connections are disabled. Unregistered request: PUT http://localhost:9200/seraph_aa_application_test/article/11 with
...

So, I think I really only want to webmock specific requests -- this doesn't seem like the right path.

thoraxe avatar Jun 19 '20 13:06 thoraxe

Continuing my troubleshooting, I tried the following in the test:

      WebMock.disable_net_connect!(allow_localhost: true)

This allowed the ElasticSearch things to work, but I still had the problem of Sidekiq's call of Faraday not being intercepted:

  1) Article tokenization and vectorization should fetch token vector for article content when content is not nil
     Failure/Error: expect(stub).to have_been_requested
     
       The request POST https://xxx/tokenize with body "content=y" was expected to execute 1 time but it executed 0 times
     
       The following requests were made:
     
       No requests were made.
       ============================================================

So this appears to me to definitely be some kind of issue with Sidekiq's test environment not being affected by webmock.

thoraxe avatar Jun 19 '20 13:06 thoraxe

This might be related to #578 because I'm using a connection object. I'm currently trying to figure out how to NOT use a connection object (to validate the problem) but still make the same configuration options.

thoraxe avatar Jun 19 '20 13:06 thoraxe

I tried switching to Excon because of #578 and I am seeing the same behavior.

Worker code:

class TokenizeAndVectorizeWorker
  include Sidekiq::Worker
  sidekiq_options queue: 'tokenizer_vectorizer', retry: true, backtrace: true

  def perform(article_id)
    article = Article.find(article_id)
    tokenizer_url = ENV['TOKENIZER_URL'] 

    if article.content.nil?
      send_content = article.abstract
    else
      send_content = article.content
    end

    response = Excon.post("#{tokenizer_url}/tokenize", 
      :body => URI.encode_www_form(:content => send_content), 
      :headers => { "Content-Type" => "application/x-www-form-urlencoded" }, 
      :expects => [200])

    # the response's body contains the JSON for the tokenized and vectorized article content
    article.token_vector = response.body

    article.save
  end
end

Test:

    it "should fetch token vector for article content when content is not nil" do
      # don't wait for async sidekiq job
      Sidekiq::Testing.inline!
      require 'webmock/rspec'

      request_url = "#{ENV['TOKENIZER_URL']'}tokenize"

      # webmock the expected request and response
      stub = stub_request(:post, request_url)
             .with(body: 'content=y')
             .to_return(body: 'y')

      # create an attrs to hand to ingest
      attrs = {
        data_source: @data_source,
        title: Faker::Book.title,
        url: Faker::Internet.url,
        content: "y",
        abstract: Faker::Book.genre,
        published_on: DateTime.now,
        created_at: DateTime.now
      }

      # ingest an article from the attrs
      status = Article.ingest(attrs)

      # the ingest occurs roughly simultaneously to the submission to the
      # worker so we need to re-fetch the article by the id because at that
      # point it will have gotten the vector saved to the DB
      @token_vector_article = Article.find(status[1].id)

      # we should have sent a request with content=y
      expect(stub).to have_been_requested

      # we should've saved "y" as the token_vector
      expect(@token_vector_article.token_vector).not_to eq(nil)
      expect(@token_vector_article.token_vector).to eq("y")
    end

Result:

root@c18df30d6d22:/usr/src/app# bundle exec rspec spec/models/article_spec.rb:174
database: seraph_aa_test
Run options: include {:locations=>{"./spec/models/article_spec.rb"=>[174]}}
.F

Failures:

  1) Article tokenization and vectorization should fetch token vector for article content when content is not nil
     Failure/Error: expect(stub).to have_been_requested
     
       The request POST https://xxx/tokenize with body "content=y" was expected to execute 1 time but it executed 0 times
     
       The following requests were made:
     
       No requests were made.
       ============================================================
     # ./spec/models/article_spec.rb:212:in `block (3 levels) in <top (required)>'
     # ./spec/rails_helper.rb:49:in `block (3 levels) in <top (required)>'
     # /usr/local/bundle/gems/database_cleaner-1.8.4/lib/database_cleaner/generic/base.rb:16:in `cleaning'
     # /usr/local/bundle/gems/database_cleaner-1.8.4/lib/database_cleaner/configuration.rb:87:in `block (2 levels) in cleaning'
     # /usr/local/bundle/gems/database_cleaner-1.8.4/lib/database_cleaner/configuration.rb:88:in `cleaning'
     # ./spec/rails_helper.rb:48:in `block (2 levels) in <top (required)>'

Finished in 3.03 seconds (files took 3.86 seconds to load)
2 examples, 1 failure

Failed examples:

rspec ./spec/models/article_spec.rb:180 # Article tokenization and vectorization should fetch token vector for article content when content is not nil

thoraxe avatar Jun 19 '20 14:06 thoraxe

How could I use binding.pry inside the worker to see if WebMock is "present" and listening for / intercepting HTTP requests?

thoraxe avatar Jun 19 '20 14:06 thoraxe

I switched to using excon and simplified things with a service class, so closing this.

thoraxe avatar Jun 19 '20 23:06 thoraxe

I'm reopening this because I ran into it again in a different place. I have several workers that interact with the internet. I don't think I should have to use service classes for webmock to intercept traffic happening inside a Sidekiq worker. Or there's something I don't understand about how to make it do so.

Simply put, if you have a model that does SomethingWorker.perform and that Sidekiq worker does Excon.get, that request is not seen by Webmock, even when that test is "inline". My guess is it's because the worker is not being directly launched via rspec, but rather rspec is doing something with a model and the model (which doesn't load webmock?) is doing something with a worker (which also doesn't load webmock?) and the worker is what's interacting with the internet... and technically that worker is outside of rspec?

thoraxe avatar Jul 28 '20 00:07 thoraxe

This is related to #897 now.

When I enable and then disable, I see the webmock failures in the logs, but they are not exposed up to rspec.

In other words, the request fails properly:

error: Real HTTP connections are disabled. Unregistered request: GET.......

But the test is considered passed because the failure inside the worker doesn't bubble up to rspec:

Finished in 0.55596 seconds (files took 4.35 seconds to load)
1 example, 0 failures

I'm not sure if this is expected or not.

thoraxe avatar Jul 28 '20 01:07 thoraxe