Possible data corruption with memcached
Hi! First of all thanks for the great work on this gem. Here's my situation/scenario: I'm deploying a Rails 7 app to AWS and using Elasticache Memcached as the store for Identity cache, rails cache and cache store.
The app runs as expected, users are able to access, the data is cached into memcached and I can see the usage metrics (bytes read/write, conn numbers, cpu, unused memory, etc) After some time, the app started raising a lot of exceptions and users were no longer able to access it. The app was barely used (testing infrastructure), and memcached metrics/health looks ok (lots of spare resources). Some examples of exceptions:
Dalli::UnmarshalError: Unable to unmarshal value: marshal data too shortDalli::DalliError: Response error: "?x??UOk$E7*?Dalli::DalliError: Response error: VADalli::DalliError: Response error: NSDalli::DalliError: Response error: ".\u0000[:idc_cached_nilf1670773635.8781602
Cleaning the memcached data fixed the issue, but it happened again the next day.
What could be causing this? I've double checked configs everywhere (sample below) and no clue so far on what could be causing it. I'm moving the data from identity cache to an isolated memcached instance.
Any clues? Thoughts?
Thank you!
gems: dalli (3.2.3) identity_cache (1.2.0) rails (7.0.4)
memcached version 1.6.17 (docker version running on CI) and on AWS Elasticache (1.6.16)
app configs:
config.session_store :cache_store,
key: "the_session_key",
expire_after: 24.hours
config.cache_store = :mem_cache_store, "memcached.server.url",
{ pool: { size: 30 }, protocol: :meta, expires_in: 24.hours,
failover: false } # avoids more cache consistency issues
config.action_policy.cache_store = :mem_cache_store, "memcached.server.url",
{ pool: { size: 30 }, protocol: :meta, expires_in: 1.hour,
failover: false } # avoids more cache consistency issues
config.identity_cache_store = :mem_cache_store, "memcached.server.url",
{ pool: { size: 30 }, protocol: :meta, expires_in: 6.hours,
failover: false } # avoids more cache consistency issues
Update 1: Issues happen when updating the model classes and doing a new deploy.
Example stack trace
Error message:
no implicit conversion of Symbol into Integer
…r/bundle/ruby/3.1.0/gems/identity_cache-1.2.0/lib/identity_cache/encoder.rb:71:in `[]'
…r/bundle/ruby/3.1.0/gems/identity_cache-1.2.0/lib/identity_cache/encoder.rb:71:in `record_from_coder'
…r/bundle/ruby/3.1.0/gems/identity_cache-1.2.0/lib/identity_cache/encoder.rb:21:in `block in decode'
…dle/ruby/3.1.0/gems/activesupport-7.0.4/lib/active_support/notifications.rb:208:in `instrument'
…r/bundle/ruby/3.1.0/gems/identity_cache-1.2.0/lib/identity_cache/encoder.rb:20:in `decode'
…/3.1.0/gems/identity_cache-1.2.0/lib/identity_cache/cached/primary_index.rb:74:in `cache_decode'
…ruby/3.1.0/gems/identity_cache-1.2.0/lib/identity_cache/cache_key_loader.rb:38:in `load'
…/3.1.0/gems/identity_cache-1.2.0/lib/identity_cache/cached/primary_index.rb:17:in `fetch'
…by/3.1.0/gems/identity_cache-1.2.0/lib/identity_cache/with_primary_index.rb:127:in `fetch_by_id'
…by/3.1.0/gems/identity_cache-1.2.0/lib/identity_cache/with_primary_index.rb:67:in `fetch_by_slug'
Here's a full stack trace. The problem just occurred without any app or server changes.
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/identity_cache-1.2.0/lib/identity_cache/encoder.rb:71:in `[]'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/identity_cache-1.2.0/lib/identity_cache/encoder.rb:71:in `record_from_coder'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/identity_cache-1.2.0/lib/identity_cache/encoder.rb:21:in `block in decode'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/activesupport-7.0.4/lib/active_support/notifications.rb:208:in `instrument'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/identity_cache-1.2.0/lib/identity_cache/encoder.rb:20:in `decode'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/identity_cache-1.2.0/lib/identity_cache/cached/primary_index.rb:74:in `cache_decode'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/identity_cache-1.2.0/lib/identity_cache/cache_key_loader.rb:38:in `load'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/identity_cache-1.2.0/lib/identity_cache/cached/primary_index.rb:17:in `fetch'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/identity_cache-1.2.0/lib/identity_cache/with_primary_index.rb:127:in `fetch_by_id'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/identity_cache-1.2.0/lib/identity_cache/with_primary_index.rb:67:in `fetch_by_slug'
/opt/main-app/current/app/utilities/account_domain_utils.rb:10:in `current_account_by_request'
/opt/main-app/current/app/utilities/account_domain_utils.rb:5:in `matches?'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/actionpack-7.0.4/lib/action_dispatch/routing/mapper.rb:40:in `block in matches?'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/actionpack-7.0.4/lib/action_dispatch/routing/mapper.rb:39:in `all?'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/actionpack-7.0.4/lib/action_dispatch/routing/mapper.rb:39:in `matches?'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/actionpack-7.0.4/lib/action_dispatch/routing/mapper.rb:46:in `serve'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/actionpack-7.0.4/lib/action_dispatch/journey/router.rb:50:in `block in serve'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/actionpack-7.0.4/lib/action_dispatch/journey/router.rb:32:in `each'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/actionpack-7.0.4/lib/action_dispatch/journey/router.rb:32:in `serve'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/actionpack-7.0.4/lib/action_dispatch/routing/route_set.rb:852:in `call'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/newrelic_rpm-8.13.1/lib/new_relic/agent/instrumentation/middleware_tracing.rb:99:in `call'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/rack-attack-6.6.1/lib/rack/attack.rb:127:in `call'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/newrelic_rpm-8.13.1/lib/new_relic/agent/instrumentation/middleware_tracing.rb:99:in `call'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/newrelic_rpm-8.13.1/lib/new_relic/rack/agent_hooks.rb:30:in `traced_call'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/newrelic_rpm-8.13.1/lib/new_relic/agent/instrumentation/middleware_tracing.rb:99:in `call'
<truncated 45 additional frames>
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/rack-2.2.4/lib/rack/runtime.rb:22:in `call'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/newrelic_rpm-8.13.1/lib/new_relic/agent/instrumentation/middleware_tracing.rb:99:in `call'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/rack-timeout-0.6.3/lib/rack/timeout/core.rb:148:in `block in call'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/rack-timeout-0.6.3/lib/rack/timeout/support/timeout.rb:19:in `timeout'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/rack-timeout-0.6.3/lib/rack/timeout/core.rb:147:in `call'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/newrelic_rpm-8.13.1/lib/new_relic/agent/instrumentation/middleware_tracing.rb:99:in `call'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/activesupport-7.0.4/lib/active_support/cache/strategy/local_cache_middleware.rb:29:in `call'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/newrelic_rpm-8.13.1/lib/new_relic/agent/instrumentation/middleware_tracing.rb:99:in `call'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/actionpack-7.0.4/lib/action_dispatch/middleware/executor.rb:14:in `call'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/newrelic_rpm-8.13.1/lib/new_relic/agent/instrumentation/middleware_tracing.rb:99:in `call'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/rack-2.2.4/lib/rack/sendfile.rb:110:in `call'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/newrelic_rpm-8.13.1/lib/new_relic/agent/instrumentation/middleware_tracing.rb:99:in `call'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/actionpack-7.0.4/lib/action_dispatch/middleware/ssl.rb:77:in `call'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/newrelic_rpm-8.13.1/lib/new_relic/agent/instrumentation/middleware_tracing.rb:99:in `call'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/actionpack-7.0.4/lib/action_dispatch/middleware/host_authorization.rb:137:in `call'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/newrelic_rpm-8.13.1/lib/new_relic/agent/instrumentation/middleware_tracing.rb:99:in `call'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/railties-7.0.4/lib/rails/engine.rb:530:in `call'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/newrelic_rpm-8.13.1/lib/new_relic/agent/instrumentation/middleware_tracing.rb:99:in `call'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/puma-6.0.0/lib/puma/configuration.rb:268:in `call'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/puma-6.0.0/lib/puma/request.rb:89:in `block in handle_request'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/puma-6.0.0/lib/puma/thread_pool.rb:343:in `with_force_shutdown'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/puma-6.0.0/lib/puma/request.rb:88:in `handle_request'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/puma-6.0.0/lib/puma/server.rb:430:in `process_client'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/puma-6.0.0/lib/puma/server.rb:233:in `block in run'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/puma-6.0.0/lib/puma/thread_pool.rb:150:in `block in spawn_thread'
The model that is failing to be cached/retrieved has some relations, but in short this is it:
class Account < ApplicationRecord
extend FriendlyId
include IdentityCache
cache_index :slug, unique: true
has_one :model1, dependent: :destroy, class_name: "model1"
has_one :model2, dependent: :destroy, class_name: "model2"
has_one :model3, dependent: :destroy, class_name: "model3"
cache_has_one :model1, embed: true
cache_has_one :model2, embed: true
cache_has_one :model3, embed: true
I've started seeing the same exception on CI while running specs. They fail randomly and fetch_by_slug calls (cache_index).
In the second exception, was the message "no implicit conversion of Symbol into Integer" as well? Looks like for some reason we get Array instead of Hash from Cache, are there any other places where you utilise Rails cache? maybe there are some key collisions?
I would try to hook into AS Notifications and log stuff on cache_write.active_support & dehydration.identity_cache messages, and try to see whenever it's only IDC or also something else.
Also if you had this inside tests as well, please keep the "seed", it will help reproduce the issue locally and potentially find the core of the problem