adlfs
adlfs copied to clipboard
Issue with parallel uploads to the same blob
There seems to be an issue when 2 instances of this file system write to the same blob from 2 different processes in parallel, where one of the uploads fails with:
Azure error
File "/code/.venv/lib/python3.10/site-packages/our_package/connector/storage/blob.py", line 117, in _save
with self._fs.open(
File "/code/.venv/lib/python3.10/site-packages/fsspec/spec.py", line 1963, in __exit__
self.close()
File "/code/.venv/lib/python3.10/site-packages/adlfs/spec.py", line 1908, in close
super().close()
File "/code/.venv/lib/python3.10/site-packages/fsspec/spec.py", line 1930, in close
self.flush(force=True)
File "/code/.venv/lib/python3.10/site-packages/fsspec/spec.py", line 1801, in flush
if self._upload_chunk(final=force) is not False:
File "/code/.venv/lib/python3.10/site-packages/fsspec/asyn.py", line 118, in wrapper
return sync(self.loop, func, *args, **kwargs)
File "/code/.venv/lib/python3.10/site-packages/fsspec/asyn.py", line 103, in sync
raise return_result
File "/code/.venv/lib/python3.10/site-packages/fsspec/asyn.py", line 56, in _runner
result[0] = await coro
File "/code/.venv/lib/python3.10/site-packages/adlfs/spec.py", line 2068, in _async_upload_chunk
await bc.commit_block_list(
File "/code/.venv/lib/python3.10/site-packages/azure/core/tracing/decorator_async.py", line 77, in wrapper_use_tracer
return await func(*args, **kwargs)
File "/code/.venv/lib/python3.10/site-packages/azure/storage/blob/aio/_blob_client_async.py", line 1861, in commit_block_list
process_storage_error(error)
File "/code/.venv/lib/python3.10/site-packages/azure/storage/blob/_shared/response_handlers.py", line 184, in process_storage_error
exec("raise error from None") # pylint: disable=exec-used # nosec
File "<string>", line 1, in <module>
azure.core.exceptions.HttpResponseError: The specified block list is invalid.
RequestId:<request_id>
Time:2024-02-13T12:15:05.1957595Z
ErrorCode:InvalidBlockList
Content: <?xml version="1.0" encoding="utf-8"?><Error><Code>InvalidBlockList</Code><Message>The specified block list is invalid.
From our limited investigation, this seems to likely be caused by the way AzureBlobFile calculates the IDs of the uploaded blocks:
https://github.com/fsspec/adlfs/blob/576fb7a6a53a55375b4458c09e5bb571d945d410/adlfs/spec.py#L2102-L2103
Could this be changed to a hash of the content or something similar, which would correspond to the actual contents of the uploaded block?
Hi, it seems to me that we can do this:
from hashlib import shake_128, and inclass AzureBlobFile, createdef _block_id(self, block_list: list[str] | None = None): if block_list is None: block_list = self._block_list return shake_128(str(block_list).encode()).hexdigest(4)[:-1]- In https://github.com/fsspec/adlfs/blob/576fb7a6a53a55375b4458c09e5bb571d945d410/adlfs/spec.py#L2102-L2103
block_id = self._block_id() - In https://github.com/fsspec/adlfs/blob/576fb7a6a53a55375b4458c09e5bb571d945d410/adlfs/spec.py#L2116-L2117
block_id = self._block_id() - In https://github.com/fsspec/adlfs/blob/576fb7a6a53a55375b4458c09e5bb571d945d410/adlfs/spec.py#L2132
if block_id == self._block_id([]) and length == 0 and final:
Now that the current #462 has been fixed, we would like to ask for a new release of adlfs, so that the changes can be applied. Maybe @TomAugspurger, since you have published the recent versions? Thanks.
@TomAugspurger , you can dd me to the pypi project, if you want me to do it.
It should be automated through GitHub Actions on tags.
On Nov 14, 2024, at 8:28 AM, Martin Durant @.***> wrote:
@TomAugspurger https://github.com/TomAugspurger , you can dd me to the pypi project, if you want me to do it.
— Reply to this email directly, view it on GitHub https://github.com/fsspec/adlfs/issues/462#issuecomment-2476495565, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKAOIVQQLGXRZ4UROB2CTD2ASXQRAVCNFSM6AAAAABEFFI4K2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINZWGQ4TKNJWGU. You are receiving this because you were mentioned.
OK, then the docs at https://github.com/fsspec/adlfs/blob/main/CONTRIBUTING.md#release are outdated :)
Hi, may I ask for an update on the version bump?
Planning on it next week.
Sorry for asking again, but may I ask for an update on the version bump? Thanks.