dask-examples
dask-examples copied to clipboard
Occasional failure in HTTP bytes
When running CI in this project I sometimes run across the following error:
~/miniconda/envs/test/lib/python3.7/site-packages/dask/bag/core.py in reify()
1603 def reify(seq):
1604 if isinstance(seq, Iterator):
-> 1605 seq = list(seq)
1606 if seq and isinstance(seq[0], Iterator):
1607 seq = list(map(list, seq))
~/miniconda/envs/test/lib/python3.7/site-packages/dask/bag/core.py in map_chunk()
1769 yield f(**k)
1770 else:
-> 1771 for a in zip(*args):
1772 yield f(*a)
1773
~/miniconda/envs/test/lib/python3.7/site-packages/dask/bag/text.py in file_to_blocks()
103 def file_to_blocks(lazy_file):
104 with lazy_file as f:
--> 105 for line in f:
106 yield line
107
~/miniconda/envs/test/lib/python3.7/site-packages/dask/bytes/http.py in read()
247 # EOF (python files don't error, just return no data)
248 return b''
--> 249 self. _fetch(self.loc, end)
250 data = self.cache[self.loc - self.start:end - self.start]
251 self.loc = end
~/miniconda/envs/test/lib/python3.7/site-packages/dask/bytes/http.py in _fetch()
258 self.start = start
259 self.end = end + self.blocksize
--> 260 self.cache = self._fetch_range(start, self.end)
261 elif start < self.start:
262 if self.end - end > self.blocksize:
~/miniconda/envs/test/lib/python3.7/site-packages/dask/bytes/http.py in _fetch_range()
320 if cl <= end - start:
321 # data size OK
--> 322 return r.content
323 else:
324 raise ValueError('Got more bytes (%i) than requested (%i)' % (
~/miniconda/envs/test/lib/python3.7/site-packages/requests/models.py in content()
826 self._content = None
827 else:
--> 828 self._content = b''.join(self.iter_content(CONTENT_CHUNK_SIZE)) or b''
829
830 self._content_consumed = True
~/miniconda/envs/test/lib/python3.7/site-packages/requests/models.py in generate()
751 yield chunk
752 except ProtocolError as e:
--> 753 raise ChunkedEncodingError(e)
754 except DecodeError as e:
755 raise ContentDecodingError(e)
ChunkedEncodingError: ('Connection broken: OSError("(104, \'ECONNRESET\')")', OSError("(104, 'ECONNRESET')"))
ChunkedEncodingError: ('Connection broken: OSError("(104, \'ECONNRESET\')")', OSError("(104, 'ECONNRESET')"))
You can ignore this error by setting the following in conf.py:
nbsphinx_allow_errors = True
Notebook error:
CellExecutionError in applications/json-data-on-the-web.ipynb:
------------------
df.spec.value_counts().nlargest(20).to_frame().compute()
------------------
@martindurant , this seems to be in your general domain. Do you have any suggestions on what might be happening here?
I'm not sure there's much we can do about broken connections, I can't see that it could be any fault of ours; retries could be built into the HTTPFileSystem, but perhaps it's better to retry the whole tasks in such cases.
Is there a good reason to avoid retries in HTTPFileSystem
?
No, but a couple of things that make it tricky:
- it is tricky to consider which set of errors should lead to a retry. Perhaps would have to retry everything
- some things, like establishing the initial connection, are already retried by requests/urllib
- if it's a timeout, then a set of retries might take a very long time to fail
- in the fsspec implementation, there is a non-seekable fallback mode when the file-size is unavailable, that gives you a requests file-like object rather than a HTTPFile. I don't think we can easily intercept its read methods for the purposes of catching errors.
This SO answer might be the best way to do it globally: https://stackoverflow.com/a/15431343/3821154 , allows you to be explicit about retries following a connection error that should apply to all connections within a session
Quite some refactoring of fsspec's HTTP implementation lately.
Are dask
tests still flaky?
AFAICS, fsspec
now returns an HTTPFile
even if range requests are not possible. Does that mean a retry policy in fsspec makes more sense now @martindurant?
HTTPFileSystem might now return a HTTPStreamFile where previously it returned a raw file-like requests response object. I don't think this changes anything from dask's point of view, except that we don't even try the "lets see if this is smaller than a block" approach. A retry would have to be for the whole of the request, not each call to read. However, a retry on establishing the connection (here) would make sense.
(feel free to implement that in a PR, in case you have the time)