AzureStorageExplorer
AzureStorageExplorer copied to clipboard
decompression worker exited early. When Flushing file
I am downloading files in bulk from a single azure blob storage container using Azure Storage Explorer.
I have connected to the container using a SAS URL.
However, for some files (100 out of 4000) I see the following error in the logs:
2022/07/04 11:49:28 ERR: [P#0-T#4131] DOWNLOADFAILED: <file_URL_redacted>: decompression worker exited early. When Flushing file. X-Ms-Request-Id:
I am able to download the files directly from the Azure Portal. Lots of the files have CONTENT-ENCODING deflate and the Setting in Azure Storage Explorer to Decompress Files is enabled.
This seems to stop the bulk download after ~4000-5000 files. If I start again from where it stopped, I can download another 4000-5000 files before the issue occurs again?
I also see errors like "Error closing file: timed out closing decompression worker" Is this because too many downloads are happening concurrently?
Any recommendations please?
Head of log file produced from Azure Storage Explorer:
2022/07/04 10:34:22 AzcopyVersion 10.14.1 2022/07/04 10:34:22 OS-Environment windows 2022/07/04 10:34:22 OS-Architecture amd64 ... 2022/07/04 10:34:22 Number of CPUs: 4 2022/07/04 10:34:22 Max file buffer RAM 2.000 GB 2022/07/04 10:34:22 Max file buffer RAM 2.000 GB 2022/07/04 10:34:22 Max concurrent network operations: 32 (Based on number of CPUs. Set AZCOPY_CONCURRENCY_VALUE environment variable to override) 2022/07/04 10:34:22 Check CPU usage when dynamically tuning concurrency: true (Based on hard-coded default. Set AZCOPY_TUNE_TO_CPU environment variable to true or false override) 2022/07/04 10:34:22 Max concurrent transfer initiation routines: 64 (Based on hard-coded default. Set AZCOPY_CONCURRENT_FILES environment variable to override) 2022/07/04 10:34:22 Max enumeration routines: 16 (Based on hard-coded default. Set AZCOPY_CONCURRENT_SCAN environment variable to override) 2022/07/04 10:34:22 Parallelize getting file properties (file.Stat): false (Based on AZCOPY_PARALLEL_STAT_FILES environment variable) 2022/07/04 10:34:22 Max open files when downloading: 2147483224 (auto-computed)
Thanks @jis260 for reaching out.
Are you able to repro the problem with a single file?
I cannot replicate on only a single file. And files that had the error in the logs can be downloaded without any issue, so it is not a problem with the decompression.
I have tried reducing the concurrency settings for the network concurrency and open files. This has helped, but the issue still occurs and this causes the transfer to fail after several hours. Previously it was failing after less than 1 hour.
Restarting it again is difficult because performing select all requires the process to check which files have already downloaded and wait for the overwrite confirmation.
I have not tested this without the decompression option enabled.
We have replicated the issue using an azcopy command with the --decompress flag. This was on Linux this time, whereas the above was on Windows.
Out of 16451 files, 221 failed. There are 124313 files to download, but we have stopped the process now. The errors look like this:
- 2022/07/06 12:16:59 ERR: [P#0-T#109] DOWNLOADFAILED: <url_removed> : 000 : decompression worker exited early. When Flushing file. X-Ms-Request-Id:
Again picking a file randomly and downloading it via the Azure web console works fine, so there is no issue with the deflate encoding.
The defaults were used without setting any environmental variables and the azcopy command was:
./azcopy copy "<sas_url>" "<output_path>" --overwrite=false --check-md5 FailIfDifferent --from-to=BlobLocal --decompress --recursive --log-level=INFO;
Tomorrow we will try without the --decompress flag to see if the error rate improves.
Without the --decompress flag over 40,000 files have been downloaded without any errors. Obviously, this is not ideal because the files need to be decompressed as a separate step. But this likely shows that the issue is with the deflate routine in azcopy and the workers being used for decompression.
Update: all files transferred successfully without the --decompress flag
Thank you @jis260 for the detailed description.
It sounds like this problem only occurs when downloading with lots of concurrency. For such performance related problems, I'll rely on @nakulkar-msft to take a look.
Thank you for your reply, @zezha-msft.
Just to add that we did try setting both concurrency settings in Azure Storage Explorer to 4 and the issue still occurred when using --decompress flag.
There is currently no way that the transfer can be completed using Azure Storage Explorer or azcopy when using the --decompress flag. Failures occur on 1000s of files at random, but restarting the job from Azure Storage Explorer is very difficult. Performing a Select All of the folders is very slow and probably leads to a lot of API calls to Azure Blob Storage.
At this time the only reliable way to transfer data is to use without the --decompress flag and then I have created a script that performs the decompression afterward. However, this is not so convenient for our end-users.
If there are settings that we can use to ensure reliable transfer at the expense of speed then I would prefer this. If lowering the concurrency further is the solution then please advise?
@jis260 Can you forward the logs to my email? Though most relevant logs are here, I'd like to take a look what led to those events?
@jis260 Can you forward the logs to my email? Though most relevant logs are here, I'd like to take a look what led to those events?
Hopefully, my email containing the logs was received? I used the email address published on your profile.
@jis260 did you ever get in contact with @nakulkar-msft over email?
@jis260 did you ever get in contact with @nakulkar-msft over email?
I sent an email to the email address on @nakulkar-msft GitHub profile, but I never received a reply. That's why I asked for confirmation above on 15 July.
Hello @MRayermannMSFT, has then been any investigation into this issue? It means that the Azure Storage Explorer is unusable for downloading files that have a content-encoding heading set. The decompression needs to be performed as a separate step, which is inconvenient. It would be really helpful to know that this is being looked into.
@jis260 I've just pinged the AzCopy team to figure out what the heck is going on here. Thanks for the patience.
@jis260 they did ever work things through with you?
Thank you, @MRayermannMSFT. Unfortunately, I have not had any follow-up from anyone at Microsoft apart from your comments on here. All rather disappointing really.