azure-sdk-for-cpp icon indicating copy to clipboard operation
azure-sdk-for-cpp copied to clipboard

Provide a BlobClient::Download function which reads and returns the entire blob rather than taking a buffer and copying it there

Open muhammadhamzasajjad opened this issue 11 months ago • 3 comments

Is your feature request related to a problem? Please describe. Currently the standard way of reading a blob from azure is as follows:

// set Azure::Storage::Blobs::BlobContainerClient container_client;
auto blob_client = container_client.GetBlockBlobClient(blob_name);
auto properties = blob_client.GetProperties(Azure::Storage::Blobs::GetBlobPropertiesOptions{}, get_context(request_timeout)).Value;
std::shared_ptr<Buffer> buffer = std::make_shared<Buffer>(properties.BlobSize);
blob_client.DownloadTo(buffer->preamble(), buffer->available(), download_option, get_context(request_timeout));

The problem is that during concurrent reads/writes, it can happen that after we have called BlobClient::GetProperties and obtained the buffer size, someone else updates the blob and the blob size changes before we do DownloadTo. This is a race condition. In this case DownloadTo throws a Azure::Core::RequestFailedException with error message Buffer is not big enough, blob range size is [new size] and status code 0. This could be fixed azure provided a function that just returns an object that contains the entire response/blob without having to call GetProperties().

Describe the solution you'd like Provide a function DownloadBlob(or some other appropriate name) that simply returns the whole blob rather than having to copy it to a buffer which necessitates calling GetProperties. This is how it would look like

// set Azure::Storage::Blobs::BlobContainerClient container_client;
auto blob_client = container_client.GetBlockBlobClient(blob_name);
[Result] blob = blob_client.DownloadTo(download_option, get_context(request_timeout));
// Result is just a placeholder for whatever type will be returned.

Describe alternatives you've considered I have considered an alternative DownloadTo which writes the whole blob to a file. But this is unwanted in my case as read/writes to files are slow.

Additional context Add any other context or screenshots about the feature request here.

Information Checklist Kindly make sure that you have added all the following information above and checkoff the required fields otherwise we will treat the issuer as an incomplete report

  • [x] Description Added
  • [x] Expected solution specified

muhammadhamzasajjad avatar Mar 11 '24 11:03 muhammadhamzasajjad

I see that there already is this BlobClient::Download function. Could we just use this? is it equivalent to BlobClient::DownloadTo with the difference that it returns the blob data?

muhammadhamzasajjad avatar Mar 11 '24 13:03 muhammadhamzasajjad

I see that there already is this BlobClient::Download function. Could we just use this? is it equivalent to BlobClient::DownloadTo with the difference that it returns the blob data?

Download downloads the blob with exact one HTTP request. DownloadTo will start multiple threads to download concurrently.

There's no such one API that can guarantee the download won't be interrupted if the blob is to be modified. You're likely to see exceptions during data transfer.

There are a few options:

  1. use mutex to protect access to the blob
  2. use lease
  3. create a immutable snapshot of the blob before downloading, and download the snapshot.

Jinming-Hu avatar Mar 14 '24 06:03 Jinming-Hu

Hi @muhammadhamzasajjad. Thank you for opening this issue and giving us the opportunity to assist. We believe that this has been addressed. If you feel that further discussion is needed, please add a comment with the text "/unresolve" to remove the "issue-addressed" label and continue the conversation.

github-actions[bot] avatar Apr 03 '24 04:04 github-actions[bot]

Hi @muhammadhamzasajjad, since you haven’t asked that we /unresolve the issue, we’ll close this out. If you believe further discussion is needed, please add a comment /unresolve to reopen the issue.

github-actions[bot] avatar Apr 10 '24 10:04 github-actions[bot]