OpenSSL::SSL::SSLError SSL_read: unexpected eof while reading
We recently migrated from self-hosted memcached servers to AWS's serverless elasticache implementation. As part of this, we now have to connect to memcache via SSL.
We're intermittently getting "unexpected EOF while reading" errors when the application tries to read from rails's cache. Uncertain of root cause, but for some reason Elasticache is hanging up on its end.
In this context the exception thrown by the openSSL client is of type OpenSSL::SSL::SSLError, which doesn't get caught by dalli and bubbles up, causing random failures in any part of the app that hits cache.
I notice ConnectionManager has handling for a handful of connection errors, including EOFError, where it will automatically try to reconnect and retry the cache read.... Should probably handle this case the same way. https://github.com/petergoldstein/dalli/blob/main/lib/dalli/protocol/connection_manager.rb#L153
Dalli / OpenSSL configuration in use:
memcache_hosts = ApplicationConfig.config['app'][Rails.env].fetch('memcache_hosts', ['127.0.0.1'])
memcache_is_elasticache = ApplicationConfig.config['app'][Rails.env].fetch('memcache_is_elasticache', false)
memcache_options = {
compress: true,
compression_max_size: 10_000,
expires_in: 1.day,
# Allow custom cache prefix to isolate multiple applications using the same cache, e.g. in jenkins
namespace: SchoolAdmin::Cache::Silo.namespace(Rails.env + ENV.fetch('RAILS_CACHE_PREFIX', '')),
socket_timeout: 10,
socket_failure_delay: 10
}
if memcache_is_elasticache
ssl_context = OpenSSL::SSL::SSLContext.new
ssl_context.ssl_version = :SSLv23
ssl_context.min_version = :TLS1_2
ssl_context.verify_hostname = false
ssl_context.verify_mode = OpenSSL::SSL::VERIFY_NONE
memcache_options[:ssl_context] = ssl_context
memcache_options[:protocol] = :meta
end
config.cache_store = [
:mem_cache_store,
*memcache_hosts,
memcache_options]
Code that reproes this is straightforward, but the error is intermittent (and rare):
MEMCACHE_TTL = 15.minutes
# ...
def fetch_all_records
Rails.cache.fetch(rails_cache_key, expires_in: MEMCACHE_TTL) do
cacheable_scope.all.to_a
end
end
(value being cached is about 1.5MB uncompressed in this case (compresses down to 150 MB), so size might matter to the underlying problem, but it works 99.9% of the time with the same sized object)
Redacted stack trace:
OpenSSL::SSL::SSLError: SSL_read: unexpected eof while reading
sysread(/opt/ruby-3.3.6/lib/ruby/3.3.0/openssl/buffering.rb:80)
fill_rbuff(/opt/ruby-3.3.6/lib/ruby/3.3.0/openssl/buffering.rb:80)
gets(/opt/ruby-3.3.6/lib/ruby/3.3.0/openssl/buffering.rb:236)
read_line(/...vendor/bundle/ruby/3.3.0/gems/dalli-3.2.8/lib/dalli/protocol/connection_manager.rb:150)
read_line(/...vendor/bundle/ruby/3.3.0/gems/dalli-3.2.8/lib/dalli/protocol/meta/response_processor.rb:201)
next_line_to_tokens(/...vendor/bundle/ruby/3.3.0/gems/dalli-3.2.8/lib/dalli/protocol/meta/response_processor.rb:205)
error_on_unexpected!(/...vendor/bundle/ruby/3.3.0/gems/dalli-3.2.8/lib/dalli/protocol/meta/response_processor.rb:169)
meta_get_with_value(/...vendor/bundle/ruby/3.3.0/gems/dalli-3.2.8/lib/dalli/protocol/meta/response_processor.rb:31)
get(/...vendor/bundle/ruby/3.3.0/gems/dalli-3.2.8/lib/dalli/protocol/meta.rb:30)
request(/...vendor/bundle/ruby/3.3.0/gems/dalli-3.2.8/lib/dalli/protocol/base.rb:36)
block in request(/...vendor/bundle/ruby/3.3.0/gems/dalli-3.2.8/lib/dalli/options.rb:18)
synchronize(/...vendor/bundle/ruby/3.3.0/gems/dalli-3.2.8/lib/dalli/options.rb:17)
request(/...vendor/bundle/ruby/3.3.0/gems/dalli-3.2.8/lib/dalli/options.rb:17)
perform(/...vendor/bundle/ruby/3.3.0/gems/dalli-3.2.8/lib/dalli/client.rb:426)
get(/...vendor/bundle/ruby/3.3.0/gems/dalli-3.2.8/lib/dalli/client.rb:64)
block (2 levels) in read_entry(/...vendor/bundle/ruby/3.3.0/gems/activesupport-6.1.7.10/lib/active_support/cache/mem_cache_store.rb:143)
with(/...vendor/bundle/ruby/3.3.0/gems/dalli-3.2.8/lib/dalli/client.rb:367)
block in read_entry(/...vendor/bundle/ruby/3.3.0/gems/activesupport-6.1.7.10/lib/active_support/cache/mem_cache_store.rb:143)
rescue_error_with(/...vendor/bundle/ruby/3.3.0/gems/activesupport-6.1.7.10/lib/active_support/cache/mem_cache_store.rb:206)
read_entry(/...vendor/bundle/ruby/3.3.0/gems/activesupport-6.1.7.10/lib/active_support/cache/mem_cache_store.rb:143)
block in read_entry(/...vendor/bundle/ruby/3.3.0/gems/activesupport-6.1.7.10/lib/active_support/cache/strategy/local_cache.rb:136)
block in fetch_entry(/...vendor/bundle/ruby/3.3.0/gems/activesupport-6.1.7.10/lib/active_support/cache/strategy/local_cache.rb:78)
fetch(/...vendor/bundle/ruby/3.3.0/gems/activesupport-6.1.7.10/lib/active_support/cache/strategy/local_cache.rb:78)
fetch_entry(/...vendor/bundle/ruby/3.3.0/gems/activesupport-6.1.7.10/lib/active_support/cache/strategy/local_cache.rb:78)
read_entry(/...vendor/bundle/ruby/3.3.0/gems/activesupport-6.1.7.10/lib/active_support/cache/strategy/local_cache.rb:134)
block in fetch(/...vendor/bundle/ruby/3.3.0/gems/activesupport-6.1.7.10/lib/active_support/cache.rb:333)
block in instrument(/...vendor/bundle/ruby/3.3.0/gems/activesupport-6.1.7.10/lib/active_support/cache.rb:726)
instrument(/...vendor/bundle/ruby/3.3.0/gems/activesupport-6.1.7.10/lib/active_support/notifications.rb:205)
instrument(/...vendor/bundle/ruby/3.3.0/gems/activesupport-6.1.7.10/lib/active_support/cache.rb:726)
fetch(/...vendor/bundle/ruby/3.3.0/gems/activesupport-6.1.7.10/lib/active_support/cache.rb:332)
fetch(/...vendor/bundle/ruby/3.3.0/gems/ddtrace-1.14.0/lib/datadog/tracing/contrib/active_support/cache/instrumentation.rb:167)
fetch_all_records(/...lib/school_admin/table_cache.rb:216)
rebuild_customer_cache(/...lib/school_admin/table_cache.rb:199)
...
just looping in this issue since they both relate to elastic search and SSL errors https://github.com/petergoldstein/dalli/issues/1031
I agree that OpenSSL::SSL::SSLError should probably be handled like the rest of these rescue SystemCallError, *TIMEOUT_ERRORS, EOFError => e