dalli icon indicating copy to clipboard operation
dalli copied to clipboard

OpenSSL::SSL::SSLError SSL_read: unexpected eof while reading

Open mruhlin opened this issue 4 months ago • 1 comments

We recently migrated from self-hosted memcached servers to AWS's serverless elasticache implementation. As part of this, we now have to connect to memcache via SSL.

We're intermittently getting "unexpected EOF while reading" errors when the application tries to read from rails's cache. Uncertain of root cause, but for some reason Elasticache is hanging up on its end.

In this context the exception thrown by the openSSL client is of type OpenSSL::SSL::SSLError, which doesn't get caught by dalli and bubbles up, causing random failures in any part of the app that hits cache.

I notice ConnectionManager has handling for a handful of connection errors, including EOFError, where it will automatically try to reconnect and retry the cache read.... Should probably handle this case the same way. https://github.com/petergoldstein/dalli/blob/main/lib/dalli/protocol/connection_manager.rb#L153

Dalli / OpenSSL configuration in use:

    memcache_hosts = ApplicationConfig.config['app'][Rails.env].fetch('memcache_hosts', ['127.0.0.1'])
    memcache_is_elasticache = ApplicationConfig.config['app'][Rails.env].fetch('memcache_is_elasticache', false)

    memcache_options = {
        compress: true,
        compression_max_size: 10_000,
        expires_in: 1.day,
        # Allow custom cache prefix to isolate multiple applications using the same cache, e.g. in jenkins
        namespace: SchoolAdmin::Cache::Silo.namespace(Rails.env + ENV.fetch('RAILS_CACHE_PREFIX', '')),
        socket_timeout: 10,
        socket_failure_delay: 10
    }

    if memcache_is_elasticache
      ssl_context = OpenSSL::SSL::SSLContext.new

      ssl_context.ssl_version = :SSLv23
      ssl_context.min_version = :TLS1_2
      ssl_context.verify_hostname = false
      ssl_context.verify_mode = OpenSSL::SSL::VERIFY_NONE

      memcache_options[:ssl_context] = ssl_context
      memcache_options[:protocol] = :meta
    end

    config.cache_store = [
        :mem_cache_store,
        *memcache_hosts,
        memcache_options]

Code that reproes this is straightforward, but the error is intermittent (and rare):

      MEMCACHE_TTL = 15.minutes
# ...
      def fetch_all_records
        Rails.cache.fetch(rails_cache_key, expires_in: MEMCACHE_TTL) do
          cacheable_scope.all.to_a
        end
      end

(value being cached is about 1.5MB uncompressed in this case (compresses down to 150 MB), so size might matter to the underlying problem, but it works 99.9% of the time with the same sized object)

Redacted stack trace:

OpenSSL::SSL::SSLError: SSL_read: unexpected eof while reading
sysread(/opt/ruby-3.3.6/lib/ruby/3.3.0/openssl/buffering.rb:80)
fill_rbuff(/opt/ruby-3.3.6/lib/ruby/3.3.0/openssl/buffering.rb:80)
gets(/opt/ruby-3.3.6/lib/ruby/3.3.0/openssl/buffering.rb:236)
read_line(/...vendor/bundle/ruby/3.3.0/gems/dalli-3.2.8/lib/dalli/protocol/connection_manager.rb:150)
read_line(/...vendor/bundle/ruby/3.3.0/gems/dalli-3.2.8/lib/dalli/protocol/meta/response_processor.rb:201)
next_line_to_tokens(/...vendor/bundle/ruby/3.3.0/gems/dalli-3.2.8/lib/dalli/protocol/meta/response_processor.rb:205)
error_on_unexpected!(/...vendor/bundle/ruby/3.3.0/gems/dalli-3.2.8/lib/dalli/protocol/meta/response_processor.rb:169)
meta_get_with_value(/...vendor/bundle/ruby/3.3.0/gems/dalli-3.2.8/lib/dalli/protocol/meta/response_processor.rb:31)
get(/...vendor/bundle/ruby/3.3.0/gems/dalli-3.2.8/lib/dalli/protocol/meta.rb:30)
request(/...vendor/bundle/ruby/3.3.0/gems/dalli-3.2.8/lib/dalli/protocol/base.rb:36)
block in request(/...vendor/bundle/ruby/3.3.0/gems/dalli-3.2.8/lib/dalli/options.rb:18)
synchronize(/...vendor/bundle/ruby/3.3.0/gems/dalli-3.2.8/lib/dalli/options.rb:17)
request(/...vendor/bundle/ruby/3.3.0/gems/dalli-3.2.8/lib/dalli/options.rb:17)
perform(/...vendor/bundle/ruby/3.3.0/gems/dalli-3.2.8/lib/dalli/client.rb:426)
get(/...vendor/bundle/ruby/3.3.0/gems/dalli-3.2.8/lib/dalli/client.rb:64)
block (2 levels) in read_entry(/...vendor/bundle/ruby/3.3.0/gems/activesupport-6.1.7.10/lib/active_support/cache/mem_cache_store.rb:143)
with(/...vendor/bundle/ruby/3.3.0/gems/dalli-3.2.8/lib/dalli/client.rb:367)
block in read_entry(/...vendor/bundle/ruby/3.3.0/gems/activesupport-6.1.7.10/lib/active_support/cache/mem_cache_store.rb:143)
rescue_error_with(/...vendor/bundle/ruby/3.3.0/gems/activesupport-6.1.7.10/lib/active_support/cache/mem_cache_store.rb:206)
read_entry(/...vendor/bundle/ruby/3.3.0/gems/activesupport-6.1.7.10/lib/active_support/cache/mem_cache_store.rb:143)
block in read_entry(/...vendor/bundle/ruby/3.3.0/gems/activesupport-6.1.7.10/lib/active_support/cache/strategy/local_cache.rb:136)
block in fetch_entry(/...vendor/bundle/ruby/3.3.0/gems/activesupport-6.1.7.10/lib/active_support/cache/strategy/local_cache.rb:78)
fetch(/...vendor/bundle/ruby/3.3.0/gems/activesupport-6.1.7.10/lib/active_support/cache/strategy/local_cache.rb:78)
fetch_entry(/...vendor/bundle/ruby/3.3.0/gems/activesupport-6.1.7.10/lib/active_support/cache/strategy/local_cache.rb:78)
read_entry(/...vendor/bundle/ruby/3.3.0/gems/activesupport-6.1.7.10/lib/active_support/cache/strategy/local_cache.rb:134)
block in fetch(/...vendor/bundle/ruby/3.3.0/gems/activesupport-6.1.7.10/lib/active_support/cache.rb:333)
block in instrument(/...vendor/bundle/ruby/3.3.0/gems/activesupport-6.1.7.10/lib/active_support/cache.rb:726)
instrument(/...vendor/bundle/ruby/3.3.0/gems/activesupport-6.1.7.10/lib/active_support/notifications.rb:205)
instrument(/...vendor/bundle/ruby/3.3.0/gems/activesupport-6.1.7.10/lib/active_support/cache.rb:726)
fetch(/...vendor/bundle/ruby/3.3.0/gems/activesupport-6.1.7.10/lib/active_support/cache.rb:332)
fetch(/...vendor/bundle/ruby/3.3.0/gems/ddtrace-1.14.0/lib/datadog/tracing/contrib/active_support/cache/instrumentation.rb:167)
fetch_all_records(/...lib/school_admin/table_cache.rb:216)
rebuild_customer_cache(/...lib/school_admin/table_cache.rb:199)
...

mruhlin avatar Aug 22 '25 18:08 mruhlin

just looping in this issue since they both relate to elastic search and SSL errors https://github.com/petergoldstein/dalli/issues/1031

I agree that OpenSSL::SSL::SSLError should probably be handled like the rest of these rescue SystemCallError, *TIMEOUT_ERRORS, EOFError => e

danmayer avatar Aug 28 '25 04:08 danmayer