gRPC-haskell icon indicating copy to clipboard operation
gRPC-haskell copied to clipboard

Timeouts Far Earlier Than Expected

Open isheff opened this issue 6 years ago • 1 comments

I wrote a quick-and-dirty script to test gRPC vs Thrift Haskell libraries when moving large, structured data over a channel. As can be seen by running the main application, even though all client timeouts are set to 10000000 seconds, I consistently get a time-out after only a second or so:

ClientIOError GRPCIOTimeout
CallStack (from HasCallStack):
  error, called at src/GRPCvsThrift/GRPCClient.hs:43:43 in gRPC-vs-Thrift-0.1.0.0-94CrnjUzsegHPgjx1PI6me:GRPCvsThrift.GRPCClient

I cannot figure out why this is. Is it possible that a max size limit violation is being interpreted as a timeout? Can the max size limit be changed?

This only occurs when the data structure being sent is large enough. To give an idea of the approximate size, when show is called on a "too large" data structure, the result is about 10,000,000 characters long. A known "not too large" data structure shows to about 6,000,000 characters.

Help?

P.S. gRPC is way, way faster than Thrift when sending big messages (~ 20-30x). Thanks so much!

isheff avatar Mar 21 '18 22:03 isheff

Hi @isheff,

We've run into a small handful of scenarios where certain error conditions (e.g. an unreachable host under certain network conditions) seem to be reported as a timeout from the C core, regardless of the actual timeout value supplied when making the client call. So it wouldn't surprise me if something similar is happening w.r.t. a max size violation.

You might try playing with the GRPC_TRACE and GRPC_VERBOSITY environment variables and see if there's evidence in the debug spew of a channel size violation.

Max size limits on channels can be modified via channel args, cf. this test and https://github.com/awakesecurity/gRPC-haskell/pull/35.

However, it looks like like I only bound MaxReceiveMesssageLength and not the corresponding send limit, which seems to be what you want to tweak. Also, note that the header indicates that a size limit of -1 connotes "no limit" but we've used a Natural -- IIRC, that was intentional, and the comment in the header was deemed stale, but apparently I didn't feel like that was important enough to mention on #35. :tada:

So it looks like there's certainly some attention needed there, specifically for the max send size, but the test I linked above seems to indicate a different status code for the failure case (StatusInvalidArgument), at least when it's the receive limit that's been exceeded.

The fix is likely to add support for the max send size channel arg, and set it appropriately; it would also be useful to determine the source of the error reporting bug. PRs welcome =). BTW, I'm more than happy to dig into this, but won't have cycles in the super short term to do so.

Thanks for the feedback!

intractable avatar Mar 22 '18 14:03 intractable