lua-resty-moongoo
lua-resty-moongoo copied to clipboard
5xx errors from random fails on asserts on moongoo socket connection (send)
Hi Isage!
I was testing it under some stressed conditions and I found some sporadic 500 errors coming from aborted: runtime error
example:
2020/02/18 21:22:09 [error] 7#7: *1242 lua entry thread aborted: runtime error: /usr/local/openresty/lualib/resty/moongoo/connection.lua:157: bad request
stack traceback:
coroutine 0:
[C]: in function 'send'
/usr/local/openresty/lualib/resty/moongoo/connection.lua:157: in function '_query'
/usr/local/openresty/lualib/resty/moongoo/cursor.lua:155: in function 'find_one'
that would be:
https://github.com/isage/lua-resty-moongoo/blob/master/lib/resty/moongoo/connection.lua#L157
connection.lua:157: assert(self:send(data))
and also
2020/02/18 21:19:37 [error] 7#7: *283 lua entry thread aborted: runtime error: /usr/local/openresty/lualib/resty/moongoo/connection.lua:88: assertion failed!
stack traceback:
coroutine 0:
[C]: in function 'assert'
/usr/local/openresty/lualib/resty/moongoo/connection.lua:88: in function '_query'
/usr/local/openresty/lualib/resty/moongoo/database.lua:43: in function '_cmd'
/usr/local/openresty/lualib/resty/moongoo.lua:75: in function 'connect'
/usr/local/openresty/lualib/resty/moongoo/cursor.lua:139: in function 'find_one'
that would be:
https://github.com/isage/lua-resty-moongoo/blob/master/lib/resty/moongoo/connection.lua#L88
connection.lua:88: assert ( r_to == cbson.uint(self._id) )
I managed to lowered the error rate even more setting: socketTimeoutMS=30000 (being 60s supposedly the default value, so then 30s as mongo )
(also including a minor lint change at:)
https://github.com/isage/lua-resty-moongoo/blob/master/lib/resty/moongoo.lua#L30
from:
local stimeout = conninfo.query.socketTimeoutMS and conninfo.query.socketTimeoutMS or nil
to:
local stimeout = conninfo.query and conninfo.query.socketTimeoutMS or nil
I'm considering changing those assert to a pcall wrapping and letting it just quietly fail without returning any data to avoid any sporadic 5xx
Any other suggestion/fix?
Second fix is fine. As for assert vs pcall, well... It looks like out-of-order error, and silently failing (while in fact request maybe succeeded) is not good. We should return some descriptive error anyway. That said, I'm more interested in proper out-of-order handling. Any ideas?
On Thu, 20 Feb 2020, 01:26 dev0pz, [email protected] wrote:
Hi Isage!
I was testing it under some stressed conditions and I found some sporadic 500 errors coming from aborted: runtime error
example:
2020/02/18 21:22:09 [error] 7#7: *1242 lua entry thread aborted: runtime error: /usr/local/openresty/lualib/resty/moongoo/connection.lua:157: bad request stack traceback: coroutine 0: [C]: in function 'send' /usr/local/openresty/lualib/resty/moongoo/connection.lua:157: in function '_query' /usr/local/openresty/lualib/resty/moongoo/cursor.lua:155: in function 'find_one'
that would be:
https://github.com/isage/lua-resty-moongoo/blob/master/lib/resty/moongoo/connection.lua#L157 connection.lua:157: assert(self:send(data))
and also
2020/02/18 21:19:37 [error] 7#7: *283 lua entry thread aborted: runtime error: /usr/local/openresty/lualib/resty/moongoo/connection.lua:88: assertion failed! stack traceback: coroutine 0: [C]: in function 'assert' /usr/local/openresty/lualib/resty/moongoo/connection.lua:88: in function '_query' /usr/local/openresty/lualib/resty/moongoo/database.lua:43: in function '_cmd' /usr/local/openresty/lualib/resty/moongoo.lua:75: in function 'connect' /usr/local/openresty/lualib/resty/moongoo/cursor.lua:139: in function 'find_one'
that would be:
https://github.com/isage/lua-resty-moongoo/blob/master/lib/resty/moongoo/connection.lua#L88 connection.lua:88: assert ( r_to == cbson.uint(self._id) )
I managed to lowered the error rate even more setting: socketTimeoutMS=30000
(also including a minor lint change at:)
https://github.com/isage/lua-resty-moongoo/blob/master/lib/resty/moongoo.lua#L30 from: local stimeout = conninfo.query.socketTimeoutMS and conninfo.query.socketTimeoutMS or nil to: local stimeout = conninfo.query and conninfo.query.socketTimeoutMS or nil I'm considering changing those assert to a pcall wrapping and letting it just quietly fail without returning any data to avoid any sporadic 5xx
Any other suggestion/fix?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/isage/lua-resty-moongoo/issues/36?email_source=notifications&email_token=AACC2IMOA7PN7EZB4NEZHK3RDWWZHA5CNFSM4KYC5AYKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IOZIT3Q, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACC2IMDETRK3CGGJJBKFFDRDWWZHANCNFSM4KYC5AYA .
I added some extra verbosity:
--assert(self:send(data))
local send_status, send_error = self:send(data)
if not send_status then ngx.log(ngx.STDERR, "Moongoo failed to send data over socket connection with data: ", tostring(data), " with error: ", send_error) end
return self:_handle_reply()
and then:
--local header = assert ( self.sock:receive ( 16 ) )
local header, rcv_err = self.sock:receive ( 16 )
if rcv_err then ngx.log(ngx.STDERR, "Moongoo failed to receive data over socket connection with error: ", rcv_err) end
so.. with extra verbosity I found sporadic closed and timeout errors at _handle_reply and some bad request at self:send(data). so I ended up cleaning up some of my code to create conns at last minute, ended up with a conservative socketTimeoutMS=10000 and wrapping it on a pcall with a fallback using on a custom retry query function (that newer fallback retry will perfectly save the day) whenever some thread aborted was caught. but perhaps moongoo could try to catch those closed sockets on the driver itself...dunno.. do you agree?