Increase grpc max message size
- Currently the limit in Python is 4 MB which fails for retrieving large application logs
@jcrist Any news on that ? What about making it configurable ?
grpc_max_receive_message_length = os.environ.get("SKEIN_MAX_MSG_SIZE", 50 * 1024 * 1024)
..
('grpc.max_receive_message_length', grpc_max_receive_message_length )
..
Apologies, things have been a bit chaotic. Thinking about this a bit more, I think the method returning logs should be a streaming output method, so the msg size matters less. We might stream one response per container, which would hopefully stay below this limit. Thoughts?
Why not but I think one response by container could still easily go above 4 MB. It turns out even 50 MB seems not enough in some cases. We have jobs that produce logs for 24 hours and each spawned container produces the same amount of logs.
In that case we would even need to stream the response for each container in chunks. But that also means we have to merge it back on client side to have a cleaner API right ? Not sure a list of strings would be a nice API to use.
Hmmm, yeah. In that case we may want to stream in fixed bytesize chunks (line-by-line seems too much to me). To the end user I think this should keep the same python api, so we'd have to reassemble in the client before returning.
I'm fine merging this as a stop-gap for now, but I think we should redesign the logs api to avoid this issue entirely.
OK. I agree it would be nice not having to specify this setting.
Having a quick look we could change the proto like this :
rpc getLogs (LogsRequest) returns (stream LogsResponse);
message LogsResponse {
string key = 1; //the container_id
string data = 2; // one chunk of the log
}
Then send n LogResponse messages and aggregate this back to the client. Seems easy on Python side to just iterate over the messages and merge them. The only complication is to propagate this all the way down to the yarn api on Java side.
👍 That rpc design makes sense to me.
OK. I also had a look at LogClient and it could also just produce a stream with Java8 stream API. I will work on it and propose something in another PR.
In the meantime, if you have time, please have a look at: https://github.com/jcrist/skein/pull/213 As this also touches the log api and probably could produce conflicts.
Hi folks,
I've been hitting this issue consistently, with my yarn jobs generating around 20MBs of logs.
It looks like there was agreement on merging the stop gap measure proposed in this PR. If that is still the case, can we please merge?