ngx_mruby icon indicating copy to clipboard operation
ngx_mruby copied to clipboard

Periodic redis errors

Open ajsharp opened this issue 6 years ago • 20 comments

I'm using ngx_mruby to do dynamic lets encrypt ssl resolution. Currently building from source against nginx 1.16.0 and ngx_mruby 2.1.5.

Here's the relevant parts of the configuration:

# /etc/nginx/nginx.conf
http {
  # ...
  include /etc/nginx/conf.d/*.conf;

  mruby_init_worker_code '
    userdata = Userdata.new
    redis_url = "redis://my.redis.url:6379"
    redis_host, redis_port = redis_url[/redis:\/\/(.+)/, 1].split(":")
    userdata.redis = Redis.new redis_host, redis_port.to_i
    userdata.redis.select 2
  ';
}

# /etc/nginx/conf.d/app.conf
# ...
server {
  listen 443 ssl;
  # ...
mruby_ssl_handshake_handler_code '
    ssl = Nginx::SSL.new
    domain = ssl.servername

    redis = Userdata.new.redis
    ssl_certificate = redis["#{domain}.crt"]
    ssl_key = redis["#{domain}.key"]

    if ssl_certificate && ssl_certificate != "" && ssl_key && ssl_key != ""
      ssl.certificate_data = ssl_certificate
      ssl.certificate_key_data = ssl_key
    end
  ';

  # ...
}

After roughly 45 minutes of running ngx_mruby, i start to see these redis connection failures:

2019/08/25 20:53:03 [error] 23164#0: *17960 ngx_mruby : mrb_run failed: return 500 HTTP status code to client: error: INLINE CODE:6: could not read reply (Redis::ConnectionError) while SSL handshaking, client: 122.36.17.229, server: 0.0.0.0:443

If I reload nginx, the errors stop. If I let the server run for about 45 minutes, they will inevitably return. Right now I'm avoiding these by reloading nginx every 15 minutes on a cron job, but it seems like there's some sort of issue with the embedded ruby code that causes the redis connection to stop working eventually. Maybe the embedded ruby code is leaving redis connections hanging or something.

The server this running on has unlimited ulimit and the number of open redis connections has never exceeded a couple hundred.

Has anyone else seen these types of errors?

ajsharp avatar Aug 25 '19 21:08 ajsharp

Thank you for you report! I'll investigate the issue.

matsumotory avatar Aug 26 '19 03:08 matsumotory

Something else I've noticed is that the existing nginx workers take an unusually long amount of time to shut down when I run a reload. Like a good 30 seconds, and the request load is very light.

ajsharp avatar Aug 26 '19 23:08 ajsharp

@matsumotory Any ideas on this?

ajsharp avatar Oct 14 '19 16:10 ajsharp

Sorry for my late response. I think Redis client timed out. So Could you implement reconnecting Redis when catching timeout error like Redis::ConnectionError?

matsumotory avatar Dec 13 '19 20:12 matsumotory

Yes, I can try. I can try wrapping the mruby_ssl_handshake_handler_code in a begin/rescue. I'm not sure how/why a reconnect would do anything there, but I'll try and report back.

ajsharp avatar Dec 13 '19 21:12 ajsharp

Yes, it's strange behavior. Maybe OS or middleware layer settings disconnects the TCP connection I guess.

matsumotory avatar Dec 13 '19 21:12 matsumotory

@matsumotory Is there a reconnect interface or method on the mruby-redis client?

ajsharp avatar Dec 27 '19 20:12 ajsharp

@ajsharp @matsumotory I have same issue, is this problem solve? thank you all

adz624 avatar Feb 12 '20 07:02 adz624

Yes, it's strange behavior. Maybe OS or middleware layer settings disconnects the TCP connection I guess.

@matsumotory We're using the same redis server (AWS elasticache) in our main app from the same host machine without connection issues, so the problem is almost definitely in this library in some way. It's unclear whether the problem is the redis client this project uses or some lower level issue with the nginx extension. However, it's not an OS/middleware issue b/c we're maintaining persistent connections to the same redis server from the same host machine without any problems.

@afunction The only "fix" we've found is to restart every 8 mins or so. It seems like there's some sort of redis connection issue.

When I tried to diagnose this before, I started looking through the redis client this project bundles, but I don't have the expertise or the time to get into the C++ code. It would be great if it was possible to add additional logging instrumentation around the connection logic.

ajsharp avatar Feb 14 '20 17:02 ajsharp

@ajsharp @afunction I'm wondering if you guys can move Redis.new into mruby_ssl_handshake_handler_code? I don't think redis[], mrb_redis_get() in C, supports reconnect.

It looks like _redisContextConnectTcp() in hredis supports it, but it is only called via redisConnectWithTimeout() from the following methods.

  • initialize(): mrb_redis_connect() in C
  • connect_set_raw(): mrb_redis_connect_set_raw() in C
  • connect_set_udptr(): mrb_redis_connect_set_raw() in C

Disclaimer: I have never used redis. I just looked at mruby-redis and hired code while waiting for windows update.

yyamano avatar Feb 17 '20 04:02 yyamano

@yyamano Thank you for your suggestion.

@ajsharp @afunction Could you try to use Redis#enable_keepalive in doc ?

matsumotory avatar Feb 17 '20 05:02 matsumotory

Or you can implement error handling like this example.

matsumotory avatar Feb 17 '20 05:02 matsumotory

@matsumotory I tried using #enable_keepalive and it didn't help -- still getting the errors. I will try the begin rescue you linked to and report back.

ajsharp avatar Mar 03 '20 03:03 ajsharp

I've put begin/rescue blocks in both the mruby_ssl_handshake_handler_code block and the mruby_init_worker_code block. Errors are only being generated from the handshake handler block. Here's my code:

  begin
    ssl = Nginx::SSL.new
    domain = ssl.servername

    redis = Userdata.new.redis
    ssl_certificate = redis["#{domain}.crt"]
    ssl_key = redis["#{domain}.key"]

    if ssl_certificate && ssl_certificate != "" && ssl_key && ssl_key != ""
      ssl.certificate_data = ssl_certificate
      ssl.certificate_key_data = ssl_key
    end
   rescue Redis::ReplyError => e
     STDERR.puts "HANDSHAKE REPLY ERROR: #{e}"
   rescue Redis::ConnectionError => e
     STDERR.puts "HANDSHAKE CONN ERROR: #{e}"
   end

I'm seeing lots of Redis::ConnectionError exceptions in the nginx error logs. Unfortunately, all the reply says is: "could not read reply"

ajsharp avatar Mar 03 '20 17:03 ajsharp

@matsumotory Is it possible to add more context here to the error message?

https://github.com/matsumotory/mruby-redis/blob/18b3d06cf238b26eb921e019fff1a9aa9ad22e95/src/mrb_redis.c#L1462

ajsharp avatar Mar 03 '20 17:03 ajsharp

@ajsharp Thank you for your details. I have implemented your suggestion on https://github.com/matsumotory/mruby-redis/pull/102 . Could you try this?

matsumotory avatar Mar 04 '20 06:03 matsumotory

@ajsharp Do you still have the problem with the newer version of mruby-redis ?

yyamano avatar Aug 12 '20 10:08 yyamano

@yyamano I will try an upgrade and let you know. Thanks for checking in on this.

ajsharp avatar Aug 13 '20 02:08 ajsharp

@yyamano Is the latest version of mruby-redis installed with the latest version of this library?

ajsharp avatar Aug 13 '20 02:08 ajsharp

@ajsharp If you rebuild ngx_mruby, the latest version of mruby-redis including the change is installed. You can check build_config.rb.lock to make sure if the latest version is installed. Mine is:

    https://github.com/matsumotory/mruby-redis.git:
      url: https://github.com/matsumotory/mruby-redis.git
      branch: HEAD
      commit: f1f98d9450783b8c281b5064554b12bfb9f0a65a
      version: 0.0.1

yyamano avatar Aug 13 '20 02:08 yyamano