box-python-sdk icon indicating copy to clipboard operation
box-python-sdk copied to clipboard

Chunked Uploader uses incorrect whole file digest upon resume

Open ashok-ka opened this issue 5 years ago • 1 comments

  • [x] I have checked that the SDK documentation and API documentation doesn't solve my issue

Description of the Issue

When using the Chunked Uploader, if the entire file completes uploading with the call to 'start' method, then it works as expected. If there is a need to use the "resume" method, then the upload almost always fails at the "commit" step due to mismatch in the whole file SHA1 digest. I could figure out the cause, and have a suggested fix mentioned below.

Versions Used

Box Python SDK: 2.7.1 (from PyPi) Python: 3.6.5

Steps to Reproduce

Upload a file using Python BoxSDK chunked uploader routine. If there is an interruption to the upload (for ex. network error), and I use the "resume" method, it causes the overall upload to fail due to whole file digest mismatch at the "commit" step.

Root Cause and Suggested solution

File: boxsdk/util/chunked_uploader.py Line: 105 Class: ChunkedUploader; Method: _upload

The line "self._sha1.update(next_part.chunk)" occurs before the part is uploaded. Hence if the part upload fails for any reason, when the ChunkedUploader retries that part upload, the SHA1 is updated with the same chunk a 2nd time causing the overall file digest to be incorrect.

Hence this line needs to be moved lower down in the same method after the part upload has successfully completed. 6 lines down, right after the "# Record that the part has been uploaded." line would be a good location to keep the "self._sha1.update(next_part.chunk)" line.

I've pasted the full code of the fixed method below ...

def _upload(self):
    """
    Utility function for looping through all parts of of the upload session and uploading them.
    """
    while len(self._part_array) < self._upload_session.total_parts:
        # Retrieve the part inflight if it exists, if it does not exist then get the next part from the stream.
        next_part = self._inflight_part or self._get_next_part()
        # Set the retrieve part to the current part inflight.
        self._inflight_part = next_part
        # Retrieve the uploaded part if the part has already been uploaded. If not upload the current part.
        uploaded_part = self._part_definitions.get(next_part.offset) or next_part.upload()
        self._inflight_part = None
        # Record that the part has been uploaded.
        self._sha1.update(next_part.chunk)
        self._part_array.append(uploaded_part)
        self._part_definitions[next_part.offset] = uploaded_part

Error Message, Including Traceback

<class 'boxsdk.exception.BoxAPIException'>. Message: File digest was incorrect. Actual: V1UVe3W571bzWYXxlmrfDvNmexk= Expected: iHU8e+hNlA0T/ElfbfcFtf+0VdM= Status: 400 Code: digest_mismatch Request ID: 9ce5dd36ea3ed10821e637bb362c40df Headers: {'Date': 'Fri, 03 Apr 2020 09:33:40 GMT', 'Content-Type': 'application/json; charset=UTF-8', 'Content-Length': '189', 'Connection': 'keep-alive', 'Strict-Transport-Security': 'max-age=31536000'} URL: https://upload.box.com/api/2.0/files/upload_sessions/0536358B86DBFB0A2080FF2840E0038B/commit Method: POST Context Info: None.

ashok-ka avatar Apr 04 '20 20:04 ashok-ka

@ashok-ka Will take a look at this and get back to you.

sujaygarlanka avatar Apr 13 '20 16:04 sujaygarlanka

This issue has been automatically marked as stale because it has not been updated in the last 30 days. It will be closed if no further activity occurs within the next 7 days. Feel free to reach out or mention Box SDK team member for further help and resources if they are needed.

stale[bot] avatar Dec 19 '22 20:12 stale[bot]

This issue has been automatically closed due to maximum period of being stale. Thank you for your contribution to Box Python SDK and feel free to open another PR/issue at any time.

stale[bot] avatar Dec 27 '22 06:12 stale[bot]

This is still an issue, has this been fixed?

hoggatt avatar Feb 17 '23 21:02 hoggatt

Here's my error (with the hashes changed slightly):

boxsdk.exception.BoxAPIException: Message: File digest was incorrect. Actual: brOFCgH2JADfILrCKvBKRKreLM4= Expected: v79jF+hO8voK4LLBCYlaNfR7CGs=
Status: 400
Code: digest_mismatch
Request ID: 4840fb99c3889b08b4c9da118d8c6b83
Headers: {'Server': 'nginx', 'Date': 'Fri, 17 Feb 2023 21:06:25 GMT', 'Content-Type': 'application/json; charset=UTF-8', 'Content-Length': '189', 'X-Box-Original-Ingress-ADC-Host': 'prod-a-traffic-manager-gt93', 'Strict-Transport-Security': 'max-age=31536000', 'Via': '1.1 google', 'Alt-Svc': 'h3=":443"; ma=2592000,h3-29=":443"; ma=2592000,h3-Q050=":443"; ma=2592000,h3-Q046=":443"; ma=2592000,h3-Q043=":443"; ma=2592000,quic=":443"; ma=2592000; v="46,43"'}
URL: https://upload.box.com/api/2.0/files/upload_sessions/F9F291D04ECA8C3690872D0DF79470FF/commit
Method: POST
Context Info: None

Note: the resume method works within the first hour and a half of starting a chunked upload. Then the connection has an issue, this issue repeats for infinity each time a resume is tried.

hoggatt avatar Feb 17 '23 22:02 hoggatt