Xee icon indicating copy to clipboard operation
Xee copied to clipboard

Long-running code results in `requests` `ChunkedEncodingError` exception (broken connection)

Open noahgolmant opened this issue 1 year ago • 1 comments

I have a script ingesting ~200 GB of landsat imagery with the current multi-threaded implementation (no Dataflow). Eventually, I always get an exception like:

requests.exceptions.ChunkedEncodingError: ('Connection broken: IncompleteRead(9186238 bytes read, 1299762 more expected)', IncompleteRead(9
186238 bytes read, 1299762 more expected))

This occurs in the common.robust_getitem call.

I've had some success in reducing the frequency of this exception by lowering the chunk size, so e.g. I can make it to ~150 GB instead of failing after 90, although hard to say if that improvement is reliable since it is non-deterministic.

I am not sure of the root cause of this-- it could be due to a multithreading/lock issue, or the server is prematurely closing the connection. Either way, the current code only applies the retry/backoff logic to EEExceptions. I've had success by retrying on any Exception rather than just EEException but that is not an ideal solution.

I'd imagine that we don't see this in Dataflow because it has its own worker retry logic?

noahgolmant avatar Jan 09 '24 19:01 noahgolmant