azure-sdk-for-python HttpResponseError: InvalidHeaderValue When appending to file in ADLSv2 storage account container

trafficstars

azure-storage-file-datalake
12.9.1
Ubuntu 20.04.4 LTS
3.8.10

Describe the bug I am running a batch job that creates a file in an ADLSv2 storage account, then retrieves data from a remote system in 2000 record pages, processes them lightly, and appends the records to the file in ADLS. Intermittently when appending data to the file I receive an azure.core.exceptions.HttpResponseError: (InvalidHeaderValue) The value for one of the HTTP headers is not in the correct format.

  File "./example.py", line 586, in datalake_process
    file_client.append_data(data=bio.getvalue(), offset=bio_size, length=bio.getbuffer().nbytes, flush=True)
  File "/home/azureuser/.local/lib/python3.8/site-packages/azure/storage/filedatalake/_data_lake_file_client.py", line 529, in append_data
    process_storage_error(error)
  File "/home/azureuser/.local/lib/python3.8/site-packages/azure/storage/filedatalake/_deserialize.py", line 215, in process_storage_error
    exec("raise error from None")   # pylint: disable=exec-used # nosec
  File "", line 1, in 
azure.core.exceptions.HttpResponseError: (InvalidHeaderValue) The value for one of the HTTP headers is not in the correct format.
RequestId:0193acfe-b017-4579-b11e-e091399edfc9
Time:2022-11-06T05:02:06.9071578Z
Code: InvalidHeaderValue
Message: The value for one of the HTTP headers is not in the correct format.
RequestId:0193acfe-b017-4579-b11e-e091399edfc9
Time:2022-11-06T05:02:06.9071578Z

To Reproduce Steps to reproduce the behavior:

Create a file in ADLS v2 storage account container
Append data to the file
Keep appending data until you reproduce the bug

Expected behavior All appends to succeed.

Additional context I experience the bug consistently in my batch job, but not consistently in any particular job or place in the file. For reference the example above was the 2855th append to the file. The previous 2854 appends had worked without issue.

Nov 06 '22 14:11 michaelmingram

Thanks for the feedback, we’ll investigate asap.

Nov 07 '22 17:11 xiangyan99

Hi @michaelmingram Michael, thanks for your inquiry. Unfortunately, I was unable to reproduce your findings locally. Attached below is a copy of my general code-flow that I used:

data = b'hello world'
file_name = "appendtest"

dfc = dc.get_file_client(file_name)
dfc.create_file()

i = 0
while(1):
    print("Appending #:", i)
    dfc.append_data(data=data, offset= 11 * i, flush=True)
    i = i + 1

I've had this run to nearly 10,000+ appends and have been unable to face the same error. I am curious as to whether you think there is something I am missing to more closely match your workflow? I have also attempted with multi-threaded appends and faced a different error (expected behavior): azure.core.exceptions.HttpResponseError: (InvalidFlushPosition) The uploaded data is not contiguous or the position query parameter value is not equal to the length of the file after appending the uploaded data.

With that being said, if you could reply with the following information it would greatly help us in identifying the root cause.

Please reproduce the error on your side and provide the RequestID just as you have in your original post (these expire on a rolling basis, so the more recent the better)
Is your batch operation doing appends concurrently?

Thanks!

Nov 09 '22 02:11 vincenttran-msft

Hi @vincenttran-msft:

RequestId:a45df50e-e01f-009d-2ed3-f01814000000, Time:2022-11-05T05:00:41.0157658Z
I'm appending to the file sequentially. The batch job is running in parallel with multiprocess, but each process is concerned with a different file.

Over the weekend I updated my code to use api_version="2019-12-12", as described here: https://github.com/Azure/azure-sdk-for-python/issues/16193. I have not seen the issue since then, but that is a workaround not a fix. If the request ID I posted above has rolled out of history I can update my job to use the current API version and reproduce in a day or two.

Regards, Mike

Nov 09 '22 19:11 michaelmingram

Hi @michaelmingram Mike, thanks for the follow up. After taking a look into the RequestID provided above, I believe to have identified the cause. The exception azure.core.exceptions.HttpResponseError: (InvalidHeaderValue) you are receiving in this particular case is due to your header failing a null or empty check. I was able to reproduce this error locally by making an append call that attempts to append empty data (i.e. data= b'').

With that being said, I would recommend you to not use the "workaround" api_version="2019-12-12" as it is generally more advisable to always use the latest version when available, and I believe this "workaround" may not have actually solved anything here. Furthermore, to avoid running into this exception, I would recommend doing an empty or null check on your data before passing it to the append call.

Hopefully this resolves your issue.

Thanks!

Nov 10 '22 01:11 vincenttran-msft

Hi, we're sending this friendly reminder because we haven't heard back from you in a while. We need more information about this issue to help address it. Please be sure to give us your input within the next 7 days. If we don't hear back from you within 14 days of this comment the issue will be automatically closed. Thank you!

Nov 29 '22 02:11 ghost

azure-sdk-for-python azure-sdk-for-python copied to clipboard

HttpResponseError: InvalidHeaderValue When appending to file in ADLSv2 storage account container

azure-sdk-for-python
azure-sdk-for-python copied to clipboard