identity_cache icon indicating copy to clipboard operation
identity_cache copied to clipboard

Possible data corruption with memcached

Open raivil opened this issue 3 years ago • 4 comments

Hi! First of all thanks for the great work on this gem. Here's my situation/scenario: I'm deploying a Rails 7 app to AWS and using Elasticache Memcached as the store for Identity cache, rails cache and cache store.

The app runs as expected, users are able to access, the data is cached into memcached and I can see the usage metrics (bytes read/write, conn numbers, cpu, unused memory, etc) After some time, the app started raising a lot of exceptions and users were no longer able to access it. The app was barely used (testing infrastructure), and memcached metrics/health looks ok (lots of spare resources). Some examples of exceptions:

  • Dalli::UnmarshalError: Unable to unmarshal value: marshal data too short
  • Dalli::DalliError: Response error: "?x??UOk$E7*?
  • Dalli::DalliError: Response error: VA
  • Dalli::DalliError: Response error: NS
  • Dalli::DalliError: Response error: ".\u0000[:idc_cached_nilf1670773635.8781602

Cleaning the memcached data fixed the issue, but it happened again the next day.

What could be causing this? I've double checked configs everywhere (sample below) and no clue so far on what could be causing it. I'm moving the data from identity cache to an isolated memcached instance.

Any clues? Thoughts?

Thank you!

gems: dalli (3.2.3) identity_cache (1.2.0) rails (7.0.4)

memcached version 1.6.17 (docker version running on CI) and on AWS Elasticache (1.6.16)

app configs:

  config.session_store :cache_store,
                       key: "the_session_key",
                       expire_after: 24.hours

  config.cache_store = :mem_cache_store, "memcached.server.url",
                       { pool: { size: 30 }, protocol: :meta, expires_in: 24.hours,
                         failover: false } # avoids more cache consistency issues

  config.action_policy.cache_store = :mem_cache_store, "memcached.server.url",
                                     { pool: { size: 30 }, protocol: :meta, expires_in: 1.hour,
                                       failover: false } # avoids more cache consistency issues
  config.identity_cache_store = :mem_cache_store, "memcached.server.url",
                                { pool: { size: 30 }, protocol: :meta, expires_in: 6.hours,
                                  failover: false } # avoids more cache consistency issues

Update 1: Issues happen when updating the model classes and doing a new deploy.

Example stack trace

Error message:
no implicit conversion of Symbol into Integer
…r/bundle/ruby/3.1.0/gems/identity_cache-1.2.0/lib/identity_cache/encoder.rb:71:in `[]'
…r/bundle/ruby/3.1.0/gems/identity_cache-1.2.0/lib/identity_cache/encoder.rb:71:in `record_from_coder'
…r/bundle/ruby/3.1.0/gems/identity_cache-1.2.0/lib/identity_cache/encoder.rb:21:in `block in decode'
…dle/ruby/3.1.0/gems/activesupport-7.0.4/lib/active_support/notifications.rb:208:in `instrument'
…r/bundle/ruby/3.1.0/gems/identity_cache-1.2.0/lib/identity_cache/encoder.rb:20:in `decode'
…/3.1.0/gems/identity_cache-1.2.0/lib/identity_cache/cached/primary_index.rb:74:in `cache_decode'
…ruby/3.1.0/gems/identity_cache-1.2.0/lib/identity_cache/cache_key_loader.rb:38:in `load'
…/3.1.0/gems/identity_cache-1.2.0/lib/identity_cache/cached/primary_index.rb:17:in `fetch'
…by/3.1.0/gems/identity_cache-1.2.0/lib/identity_cache/with_primary_index.rb:127:in `fetch_by_id'
…by/3.1.0/gems/identity_cache-1.2.0/lib/identity_cache/with_primary_index.rb:67:in `fetch_by_slug'

raivil avatar Dec 13 '22 19:12 raivil

Here's a full stack trace. The problem just occurred without any app or server changes.

/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/identity_cache-1.2.0/lib/identity_cache/encoder.rb:71:in `[]'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/identity_cache-1.2.0/lib/identity_cache/encoder.rb:71:in `record_from_coder'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/identity_cache-1.2.0/lib/identity_cache/encoder.rb:21:in `block in decode'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/activesupport-7.0.4/lib/active_support/notifications.rb:208:in `instrument'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/identity_cache-1.2.0/lib/identity_cache/encoder.rb:20:in `decode'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/identity_cache-1.2.0/lib/identity_cache/cached/primary_index.rb:74:in `cache_decode'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/identity_cache-1.2.0/lib/identity_cache/cache_key_loader.rb:38:in `load'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/identity_cache-1.2.0/lib/identity_cache/cached/primary_index.rb:17:in `fetch'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/identity_cache-1.2.0/lib/identity_cache/with_primary_index.rb:127:in `fetch_by_id'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/identity_cache-1.2.0/lib/identity_cache/with_primary_index.rb:67:in `fetch_by_slug'
/opt/main-app/current/app/utilities/account_domain_utils.rb:10:in `current_account_by_request'
/opt/main-app/current/app/utilities/account_domain_utils.rb:5:in `matches?'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/actionpack-7.0.4/lib/action_dispatch/routing/mapper.rb:40:in `block in matches?'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/actionpack-7.0.4/lib/action_dispatch/routing/mapper.rb:39:in `all?'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/actionpack-7.0.4/lib/action_dispatch/routing/mapper.rb:39:in `matches?'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/actionpack-7.0.4/lib/action_dispatch/routing/mapper.rb:46:in `serve'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/actionpack-7.0.4/lib/action_dispatch/journey/router.rb:50:in `block in serve'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/actionpack-7.0.4/lib/action_dispatch/journey/router.rb:32:in `each'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/actionpack-7.0.4/lib/action_dispatch/journey/router.rb:32:in `serve'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/actionpack-7.0.4/lib/action_dispatch/routing/route_set.rb:852:in `call'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/newrelic_rpm-8.13.1/lib/new_relic/agent/instrumentation/middleware_tracing.rb:99:in `call'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/rack-attack-6.6.1/lib/rack/attack.rb:127:in `call'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/newrelic_rpm-8.13.1/lib/new_relic/agent/instrumentation/middleware_tracing.rb:99:in `call'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/newrelic_rpm-8.13.1/lib/new_relic/rack/agent_hooks.rb:30:in `traced_call'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/newrelic_rpm-8.13.1/lib/new_relic/agent/instrumentation/middleware_tracing.rb:99:in `call'
<truncated 45 additional frames>
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/rack-2.2.4/lib/rack/runtime.rb:22:in `call'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/newrelic_rpm-8.13.1/lib/new_relic/agent/instrumentation/middleware_tracing.rb:99:in `call'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/rack-timeout-0.6.3/lib/rack/timeout/core.rb:148:in `block in call'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/rack-timeout-0.6.3/lib/rack/timeout/support/timeout.rb:19:in `timeout'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/rack-timeout-0.6.3/lib/rack/timeout/core.rb:147:in `call'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/newrelic_rpm-8.13.1/lib/new_relic/agent/instrumentation/middleware_tracing.rb:99:in `call'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/activesupport-7.0.4/lib/active_support/cache/strategy/local_cache_middleware.rb:29:in `call'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/newrelic_rpm-8.13.1/lib/new_relic/agent/instrumentation/middleware_tracing.rb:99:in `call'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/actionpack-7.0.4/lib/action_dispatch/middleware/executor.rb:14:in `call'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/newrelic_rpm-8.13.1/lib/new_relic/agent/instrumentation/middleware_tracing.rb:99:in `call'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/rack-2.2.4/lib/rack/sendfile.rb:110:in `call'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/newrelic_rpm-8.13.1/lib/new_relic/agent/instrumentation/middleware_tracing.rb:99:in `call'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/actionpack-7.0.4/lib/action_dispatch/middleware/ssl.rb:77:in `call'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/newrelic_rpm-8.13.1/lib/new_relic/agent/instrumentation/middleware_tracing.rb:99:in `call'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/actionpack-7.0.4/lib/action_dispatch/middleware/host_authorization.rb:137:in `call'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/newrelic_rpm-8.13.1/lib/new_relic/agent/instrumentation/middleware_tracing.rb:99:in `call'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/railties-7.0.4/lib/rails/engine.rb:530:in `call'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/newrelic_rpm-8.13.1/lib/new_relic/agent/instrumentation/middleware_tracing.rb:99:in `call'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/puma-6.0.0/lib/puma/configuration.rb:268:in `call'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/puma-6.0.0/lib/puma/request.rb:89:in `block in handle_request'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/puma-6.0.0/lib/puma/thread_pool.rb:343:in `with_force_shutdown'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/puma-6.0.0/lib/puma/request.rb:88:in `handle_request'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/puma-6.0.0/lib/puma/server.rb:430:in `process_client'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/puma-6.0.0/lib/puma/server.rb:233:in `block in run'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/puma-6.0.0/lib/puma/thread_pool.rb:150:in `block in spawn_thread'

The model that is failing to be cached/retrieved has some relations, but in short this is it:

class Account < ApplicationRecord
  extend FriendlyId
  include IdentityCache

  cache_index :slug, unique: true

  has_one :model1, dependent: :destroy, class_name: "model1"
  has_one :model2, dependent: :destroy, class_name: "model2"
  has_one :model3, dependent: :destroy, class_name: "model3"
  cache_has_one :model1, embed: true
  cache_has_one :model2, embed: true
  cache_has_one :model3, embed: true

raivil avatar Dec 15 '22 01:12 raivil

I've started seeing the same exception on CI while running specs. They fail randomly and fetch_by_slug calls (cache_index).

raivil avatar Dec 19 '22 22:12 raivil

In the second exception, was the message "no implicit conversion of Symbol into Integer" as well? Looks like for some reason we get Array instead of Hash from Cache, are there any other places where you utilise Rails cache? maybe there are some key collisions?

I would try to hook into AS Notifications and log stuff on cache_write.active_support & dehydration.identity_cache messages, and try to see whenever it's only IDC or also something else.

driv3r avatar Jan 17 '23 10:01 driv3r

Also if you had this inside tests as well, please keep the "seed", it will help reproduce the issue locally and potentially find the core of the problem

driv3r avatar Jan 17 '23 11:01 driv3r