pip icon indicating copy to clipboard operation
pip copied to clipboard

interrupted download reports as hash failure

Open RonnyPfannschmidt opened this issue 3 years ago • 8 comments

Description

follow-up to #4930

when a large package download is interrupted on a bad link, pip reports a bad hash instead of the interrupt of the download, this leads to first misidentifying the problem

Collecting $REDACTED
  Downloading $REDACTED
     ━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 14.8/52.4 MB 51.3 kB/s eta 0:12:13

ERROR: THESE PACKAGES DO NOT MATCH THE HASHES FROM THE REQUIREMENTS FILE. If you have updated the package versions, please update the hashes. Otherwise, examine the package contents carefully; someone may have tampered with them.
    $REDACTED from $REDACTED#md5=04b4d65eda8bf72ae203d40031aa76a3:
        Expected md5 04b4d65eda8bf72ae203d40031aa76a3
             Got        c66b2d113159da2c6911c475ec00b26f

Expected behavior

pip should report the download as interrupted to indicate the actual problem its a fact that the hash will of course differ, if you hash a subset instead of al lthe data, however the error happens at obtaining the data, so failing at the hash is misleading,

i was earnestly trying to figure where my data had gotten corrupted until i realized that the progress was actually not done

pip version

22.1.1

Python version

3.8

OS

Fedora

How to Reproduce

unfortunately i cannot provide a broken network reproducer quickly

Output

No response

Code of Conduct

RonnyPfannschmidt avatar May 31 '22 08:05 RonnyPfannschmidt

I wonder why pip treats the download as successfully completed in the first place. Is this a limitation in requests or even urllib3?

uranusjr avatar Jun 01 '22 05:06 uranusjr

pip reads directly from Response.raw.stream and it seems that urllib3 does not raise an error if the connection gets closed while reading chunks. I don't know enough about urllib3 to tell whether it should raise an error or not. However, what pip can do is keep count of the downloaded bytes, compare them to the response's Content-Length header before checking hashes, and let the user know that the download was not successful. That seems like a fairly small change and would prevent confusion for the user. I can open an initial PR, unless you think this should be handled by urllib3.

Mr-Pepe avatar May 31 '23 03:05 Mr-Pepe

We're consistently seeing this when downloading whls/artifacts that are ~20MB+. We can look into what's causing the networking flakes but this has been a confusing error that we're regularly seeing. Would be very supportive of this change.

rahul-theorem avatar Jul 21 '23 21:07 rahul-theorem

I’m marking this as help wanted since it requires someone that can reliably reproduce this to look into how urllib3 marks the download as complete, and how to perform further sniffing in pip’s networking code to work around this. I would strongly suggest anyone reaching here to attempt to dig deeper into urllib3 to figure out what exactly went wrong and work on a pull request.

uranusjr avatar Jul 23 '23 21:07 uranusjr

https://github.com/psf/requests/issues/4956 perhaps

dimbleby avatar Jul 24 '23 22:07 dimbleby

I've also been seeing this issue occasionally in CI builds and have started to investigate this issue. I've setup an intentionally broken local Flask server to proxy PyPI, but to randomly truncate the response and can reproduce this error.

It's true that this is related to the linked requests/urllib3 enforce_content_length issue, which is resolved as of urllib3 v2.0. Unfortunately, upgrading urllib3 alone is not sufficient to resolve this issue (although upgrading urllib3 does give a better error message). The problem is that, due to the way pip/requests streams the response from urllib3, the urllib3 retry logic which pip depends on is bypassed. This can actually happen in two places:

Tracebacks

Response truncated downloading package

  Downloading http://127.0.0.1:5000/files/packages/fa/1a/f191d32818e5cd985bdd3f47a6e4f525e2db1ce5e8150045ca0c31813686/Flask-2.3.2-py3-none-any.whl (96 kB)
ERROR: Exception:
Traceback (most recent call last):
  File "./pip/_vendor/urllib3/response.py", line 704, in _error_catcher
    yield
  File "./pip/_vendor/urllib3/response.py", line 829, in _raw_read
    raise IncompleteRead(self._fp_bytes_read, self.length_remaining)
pip._vendor.urllib3.exceptions.IncompleteRead: IncompleteRead(10 bytes read, 96857 more expected)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "./pip/_internal/cli/base_command.py", line 180, in exc_logging_wrapper
    status = run_func(*args)
             ^^^^^^^^^^^^^^^
  File "./pip/_internal/cli/req_command.py", line 248, in wrapper
    return func(self, options, args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "./pip/_internal/commands/install.py", line 377, in run
    requirement_set = resolver.resolve(
                      ^^^^^^^^^^^^^^^^^
  File "./pip/_internal/resolution/resolvelib/resolver.py", line 92, in resolve
    result = self._result = resolver.resolve(
                            ^^^^^^^^^^^^^^^^^
  File "./pip/_vendor/resolvelib/resolvers.py", line 546, in resolve
    state = resolution.resolve(requirements, max_rounds=max_rounds)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "./pip/_vendor/resolvelib/resolvers.py", line 397, in resolve
    self._add_to_criteria(self.state.criteria, r, parent=None)
  File "./pip/_vendor/resolvelib/resolvers.py", line 173, in _add_to_criteria
    if not criterion.candidates:
  File "./pip/_vendor/resolvelib/structs.py", line 156, in __bool__
    return bool(self._sequence)
           ^^^^^^^^^^^^^^^^^^^^
  File "./pip/_internal/resolution/resolvelib/found_candidates.py", line 155, in __bool__
    return any(self)
           ^^^^^^^^^
  File "./pip/_internal/resolution/resolvelib/found_candidates.py", line 143, in <genexpr>
    return (c for c in iterator if id(c) not in self._incompatible_ids)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "./pip/_internal/resolution/resolvelib/found_candidates.py", line 47, in _iter_built
    candidate = func()
                ^^^^^^
  File "./pip/_internal/resolution/resolvelib/factory.py", line 206, in _make_candidate_from_link
    self._link_candidate_cache[link] = LinkCandidate(
                                       ^^^^^^^^^^^^^^
  File "./pip/_internal/resolution/resolvelib/candidates.py", line 293, in __init__
    super().__init__(
  File "./pip/_internal/resolution/resolvelib/candidates.py", line 156, in __init__
    self.dist = self._prepare()
                ^^^^^^^^^^^^^^^
  File "./pip/_internal/resolution/resolvelib/candidates.py", line 225, in _prepare
    dist = self._prepare_distribution()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "./pip/_internal/resolution/resolvelib/candidates.py", line 304, in _prepare_distribution
    return preparer.prepare_linked_requirement(self._ireq, parallel_builds=True)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "./pip/_internal/operations/prepare.py", line 540, in prepare_linked_requirement
    return self._prepare_linked_requirement(req, parallel_builds)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "./pip/_internal/operations/prepare.py", line 611, in _prepare_linked_requirement
    local_file = unpack_url(
                 ^^^^^^^^^^^
  File "./pip/_internal/operations/prepare.py", line 168, in unpack_url
    file = get_http_url(
           ^^^^^^^^^^^^^
  File "./pip/_internal/operations/prepare.py", line 109, in get_http_url
    from_path, content_type = download(link, temp_dir.path)
                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "./pip/_internal/network/download.py", line 147, in __call__
    for chunk in chunks:
  File "./pip/_internal/cli/progress_bars.py", line 53, in _rich_progress_bar
    for chunk in iterable:
  File "./pip/_internal/network/utils.py", line 63, in response_chunks
    for chunk in response.raw.stream(
  File "./pip/_vendor/urllib3/response.py", line 934, in stream
    data = self.read(amt=amt, decode_content=decode_content)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "./pip/_vendor/urllib3/response.py", line 873, in read
    data = self._raw_read(amt)
           ^^^^^^^^^^^^^^^^^^^
  File "./pip/_vendor/urllib3/response.py", line 807, in _raw_read
    with self._error_catcher():
  File "/usr/lib64/python3.11/contextlib.py", line 155, in __exit__
    self.gen.throw(typ, value, traceback)
  File "./pip/_vendor/urllib3/response.py", line 721, in _error_catcher
    raise ProtocolError(f"Connection broken: {e!r}", e) from e
pip._vendor.urllib3.exceptions.ProtocolError: ('Connection broken: IncompleteRead(10 bytes read, 96857 more expected)', IncompleteRead(10 bytes read, 96857 more expected))

Response truncated getting package metadata


http://127.0.0.1:5000 "GET /pypi/simple/flask/ HTTP/1.1" 200 39262
ERROR: Could not install packages due to an OSError.
Traceback (most recent call last):
  File "./pip/_vendor/urllib3/response.py", line 704, in _error_catcher
    yield
  File "./pip/_vendor/urllib3/response.py", line 829, in _raw_read
    raise IncompleteRead(self._fp_bytes_read, self.length_remaining)
pip._vendor.urllib3.exceptions.IncompleteRead: IncompleteRead(10 bytes read, 39252 more expected)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "./pip/_vendor/requests/models.py", line 816, in generate
    yield from self.raw.stream(chunk_size, decode_content=True)
  File "./pip/_vendor/urllib3/response.py", line 934, in stream
    data = self.read(amt=amt, decode_content=decode_content)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "./pip/_vendor/urllib3/response.py", line 905, in read
    data = self._raw_read(amt)
           ^^^^^^^^^^^^^^^^^^^
  File "./pip/_vendor/urllib3/response.py", line 807, in _raw_read
    with self._error_catcher():
  File "/usr/lib64/python3.11/contextlib.py", line 155, in __exit__
    self.gen.throw(typ, value, traceback)
  File "./pip/_vendor/urllib3/response.py", line 721, in _error_catcher
    raise ProtocolError(f"Connection broken: {e!r}", e) from e
pip._vendor.urllib3.exceptions.ProtocolError: ('Connection broken: IncompleteRead(10 bytes read, 39252 more expected)', IncompleteRead(10 bytes read, 39252 more expected))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "./pip/_internal/commands/install.py", line 377, in run
    requirement_set = resolver.resolve(
                      ^^^^^^^^^^^^^^^^^
  File "./pip/_internal/resolution/resolvelib/resolver.py", line 92, in resolve
    result = self._result = resolver.resolve(
                            ^^^^^^^^^^^^^^^^^
  File "./pip/_vendor/resolvelib/resolvers.py", line 546, in resolve
    state = resolution.resolve(requirements, max_rounds=max_rounds)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "./pip/_vendor/resolvelib/resolvers.py", line 397, in resolve
    self._add_to_criteria(self.state.criteria, r, parent=None)
  File "./pip/_vendor/resolvelib/resolvers.py", line 173, in _add_to_criteria
    if not criterion.candidates:
  File "./pip/_vendor/resolvelib/structs.py", line 156, in __bool__
    return bool(self._sequence)
           ^^^^^^^^^^^^^^^^^^^^
  File "./pip/_internal/resolution/resolvelib/found_candidates.py", line 155, in __bool__
    return any(self)
           ^^^^^^^^^
  File "./pip/_internal/resolution/resolvelib/found_candidates.py", line 143, in <genexpr>
    return (c for c in iterator if id(c) not in self._incompatible_ids)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "./pip/_internal/resolution/resolvelib/found_candidates.py", line 44, in _iter_built
    for version, func in infos:
  File "./pip/_internal/resolution/resolvelib/factory.py", line 279, in iter_index_candidate_infos
    result = self._finder.find_best_candidate(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "./pip/_internal/index/package_finder.py", line 890, in find_best_candidate
    candidates = self.find_all_candidates(project_name)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "./pip/_internal/index/package_finder.py", line 831, in find_all_candidates
    page_candidates = list(page_candidates_it)
                      ^^^^^^^^^^^^^^^^^^^^^^^^
  File "./pip/_internal/index/sources.py", line 134, in page_candidates
    yield from self._candidates_from_page(self._link)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "./pip/_internal/index/package_finder.py", line 791, in process_project_url
    index_response = self._link_collector.fetch_response(project_url)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "./pip/_internal/index/collector.py", line 461, in fetch_response
    return _get_index_content(location, session=self.session)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "./pip/_internal/index/collector.py", line 364, in _get_index_content
    resp = _get_simple_response(url, session=session)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "./pip/_internal/index/collector.py", line 135, in _get_simple_response
    resp = session.get(
           ^^^^^^^^^^^^
  File "./pip/_vendor/requests/sessions.py", line 602, in get
    return self.request("GET", url, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "./pip/_internal/network/session.py", line 519, in request
    return super().request(method, url, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "./pip/_vendor/requests/sessions.py", line 589, in request
    resp = self.send(prep, **send_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "./pip/_vendor/requests/sessions.py", line 747, in send
    r.content
  File "./pip/_vendor/requests/models.py", line 899, in content
    self._content = b"".join(self.iter_content(CONTENT_CHUNK_SIZE)) or b""
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "./pip/_vendor/requests/models.py", line 818, in generate
    raise ChunkedEncodingError(e)
pip._vendor.requests.exceptions.ChunkedEncodingError: ('Connection broken: IncompleteRead(10 bytes read, 39252 more expected)', IncompleteRead(10 bytes read, 39252 more expected))

(This second error would be reported as JSONDecodeError in the current version of pip.)

There's some related retry discussion here: https://github.com/urllib3/urllib3/issues/542

In essence, the issue is that (in this specific scenario), pip/requests/urllib3 don't cooperate very well to retry failed requests. I suspect that fixing this issue will require some other changes external to pip.

zweger avatar Aug 21 '23 16:08 zweger

There's a bunch of moving pieces here, so I'll just outline the steps which I believe are required to resolve these issues.

  1. https://github.com/pypa/pip/issues/12857, which checks that Content-Length matches the body length.
  2. https://github.com/pypa/pip/issues/4796 / https://github.com/pypa/pip/pull/11180, which adds some retry functionality into pip for downloading packages. (For downloading files, pip uses urllib3 directly. For other things, pip uses requests.)
  3. https://github.com/psf/requests/issues/6512 , which would allow pip to retry other failed requests.

zweger avatar Aug 23 '23 19:08 zweger

any workarounds?

tooptoop4 avatar Sep 13 '24 06:09 tooptoop4