azure-sdk-for-net icon indicating copy to clipboard operation
azure-sdk-for-net copied to clipboard

[QUERY] How to handle BlobModifiedWhileReading errors

Open lymedo opened this issue 1 year ago • 4 comments

Library name and version

Azure.Storage.Blobs 12.14.1

Query/Question

I have an Azure Function app consuming messages from an Event Hub and appending the data to block blobs.

We are observing a large volume of BlobModifiedWhileReading errors, which I am told by Microsoft are down to a conflict in read/write modes:

  • AppendBlock calls require the blob to be in write-optimized mode
  • Calls like GetBlobProperties and ListBlobs, require it to be in read-optimized mode

The conflict occurs when the get/put operations are made simultaneously on the same blob - at scale in my case.

The code I am using checks if the blob exists on each request, creates it if required and then appends the blob.

if (!await appendBlobClient.ExistsAsync()) {
    await appendBlobClient.CreateIfNotExistsAsync();
}
    
using var ms = new MemoryStream(blockContent);
await appendBlobClient.AppendBlockAsync(ms);

I'm looking for some advice on a pattern to mitigate these errors. Any help would be appreciated.

Thanks

Environment

Azure Function Runtime 4 / .NET 6.0

lymedo avatar Feb 16 '23 13:02 lymedo

Thank you for your feedback. This has been routed to the support team for assistance.

ghost avatar Feb 16 '23 14:02 ghost

@lymedo Thanks for reaching out to us and reporting this issue. BlobModifiedWhileReading is a scenario where a blob is being read by one process while it's being modified by another process. This can result in unexpected behavior, such as reading outdated data or encountering errors. To avoid this scenario, you can use a lock mechanism, such as a lease / concurrency, to ensure that only one request can modify the blob at a time.

You can implement either optimistic or pessimistic concurrency in your application.

Optimistic concurrency checks the ETag value for a blob and compares it to the ETag provided. If the ETag values match, the operation is allowed to proceed. If the ETag values don't match, the operation fails and the error is returned to the caller.

Pessimistic concurrency uses an exclusive lease to lock the blob to other writers.

Refer the sample code for Optimistic Concurrency / Pessimistic concurrency for blobs.

navba-MSFT avatar Feb 17 '23 04:02 navba-MSFT

Thanks for the information. Having read through the docs, it feels like I need Optimistic concurrency with conditional headers in the request.

If I supplied If-Matched || If-Not-Matched, I assume that this is telling the client to ignore modifications and the request should no longer return the BlobModifiedWhileReading error.

Please can you confirm if my understanding is correct?

lymedo avatar Feb 17 '23 07:02 lymedo

@lymedo Thanks for getting back. Actually, using the If-Match or If-None-Match headers in a request to Azure Storage does not tell the client to ignore modifications to the blob, but rather specifies a precondition that must be met for the request to be processed. These headers allow the client to check whether the blob has been modified since the last time it was accessed, and to specify the desired behavior in case the precondition is not met.

If the If-Match header is included in a request to update a blob, the server will check whether the current ETag value of the blob matches the value provided in the header. If the values do not match, the server will return a 412 (Precondition Failed) error, indicating that the blob has been modified since the client last accessed it, and the update request will not be processed.

Similarly, if the If-None-Match header is included in a request to retrieve a blob, the server will check whether the current ETag value of the blob matches the value provided in the header. If the values match, the server will return a 304 (Not Modified) response, indicating that the blob has not been modified since the client last accessed it, and the client can use its cached copy of the blob.

In either case, the use of these headers helps to ensure that the client is working with the most recent version of the blob, and prevents the client from accidentally overwriting changes made by another client. So, using these headers can help to avoid the BlobModifiedWhileReading error, but they do not tell the client to ignore modifications to the blob.

navba-MSFT avatar Feb 17 '23 09:02 navba-MSFT

With an append block blob, I'm not concerned about reading or writing to the latest version of the blob. With my consumer function being a real-time ingestion app, I just need to append the event data to the blob.

I'm not sure if the suggested concurrency approaches help me - feels like it will be just raising a different error and I'll potentially lose event data.

I would like to find a way to suppress the BlobModifedWhileReading error.

lymedo avatar Feb 17 '23 23:02 lymedo

@amnguye / @seanmcc-msft Any pointers on suppressing the BlobModifedWhileReading error ?

navba-MSFT avatar Feb 20 '23 05:02 navba-MSFT

To elaborate...

AppendBlockAsync() raises an error if the blob is not found. This requires checking if the blob exists and creating the blob if required. The downside to this is that on every attempt to append to a blob, a get blob request is made. Due to the nature of my app, conflicts occur (BlobModifiedWhileReading) due to the volume of appends with an associated get blob.

All I need to know is that the blob exists before my client appends to it. The blob version is not important as I'm not modifying existing data...I'm just adding a new block.

lymedo avatar Feb 20 '23 07:02 lymedo

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @xgithubtriage.

Issue Details

Library name and version

Azure.Storage.Blobs 12.14.1

Query/Question

I have an Azure Function app consuming messages from an Event Hub and appending the data to block blobs.

We are observing a large volume of BlobModifiedWhileReading errors, which I am told by Microsoft are down to a conflict in read/write modes:

  • AppendBlock calls require the blob to be in write-optimized mode
  • Calls like GetBlobProperties and ListBlobs, require it to be in read-optimized mode

The conflict occurs when the get/put operations are made simultaneously on the same blob - at scale in my case.

The code I am using checks if the blob exists on each request, creates it if required and then appends the blob.

if (!await appendBlobClient.ExistsAsync()) {
    await appendBlobClient.CreateIfNotExistsAsync();
}
    
using var ms = new MemoryStream(blockContent);
await appendBlobClient.AppendBlockAsync(ms);

I'm looking for some advice on a pattern to mitigate these errors. Any help would be appreciated.

Thanks

Environment

Azure Function Runtime 4 / .NET 6.0

Author: lymedo
Assignees: amnguye
Labels:

Storage, Service Attention, Client, customer-reported, question, needs-team-attention

Milestone: -

ghost avatar Feb 21 '23 19:02 ghost

Seems like a design question about how to properly design a "real-time injestion app". Which this may not be exactly the right place this kind of question but we can try to help inform you on how to best avoid these errors if you don't care about what the current state of the append blob at the time of reading or writing.

I might miss a few things since this thread is long, so feel free to reask some questions or statements.

AppendBlockAsync() raises an error if the blob is not found. This requires checking if the blob exists and creating the blob if required. The downside to this is that on every attempt to append to a blob, a get blob request is made. Due to the nature of my app, conflicts occur (BlobModifiedWhileReading) due to the volume of appends with an associated get blob.

I don't think you actually need a GetBlob request here. I think you should use the CreateIfNotExists method and then allow that convenience method to catch the error if the blob already exists and then continue on from there.

The workflow should look like this.

AppendBlobClient appendBlob = new AppendBlobClient(...);
appendBlob.CreateIfNotExists();
appendBlob.AppendBlock(...);

This should avoid the unnecessary GetBlob call and having to manually catch the BlobModifiedWhileReading error.

As for this error occurring while Reading / Downloading the blob while writing to the blob, I'm afraid there might not be much we can do about that since by design from the storage team on how Append Blobs work. The current recommendation is to do a try-catch and catch that expected error of BlobModifiedWhileReading and retry the request.

try
{
appendBlob.AppendBlock(...)
}
catch (RequestFailedException exception)
when (exception.ErrorCode == "BlobModifiedWhileReading")
{
// retry the request
}

Sorry this is probably exactly the solution you were looking for, but unfortunately we have to adhere to the storage service design.

amnguye avatar Feb 21 '23 19:02 amnguye

Hi, thanks for the response.

The problem with just using CreateIfNotExistsAsync() is that it logs BlobAlreadyExists in the storage diagnostics logs for every request, which again creates a whole lot of noise - this is reason why ExistsAsync() was implemented.

I am currently testing out a similar try/catch logic:


try
{
    ms.Seek(0, SeekOrigin.Begin);
    await appendBlobClient.AppendBlockAsync(ms);
}
catch (Azure.RequestFailedException ex)
{
    if (!string.IsNullOrEmpty(ex.ErrorCode) && ex.ErrorCode == "BlobNotFound")
    {
        await appendBlobClient.CreateIfNotExistsAsync();
        ms.Seek(0, SeekOrigin.Begin);
        await appendBlobClient.AppendBlockAsync(ms);
    }
    else
    {
        throw;
    }
}
finally
{
    ms.Dispose();
}

Initial results show an increase in BlobNotFound errors and, depending on the degree of parallelism on the object, an increase in BlobAlreadyExists. However, it does seem to mitigate the BlobModifiedWhileReading error.

I'll report back with my findings.

Any feedback on this approach would be appreciated.

lymedo avatar Feb 22 '23 09:02 lymedo

Ah I did see that github issue that you reported this issue here, and thanks for that call out. (I will address that issue in that thread so we don't merge things here). We will attempt to resolve that issue so that you can freely use CreateIfNotExists again.

amnguye avatar Feb 22 '23 21:02 amnguye

I have almost an identical issue and the same use case with append block. We noticed a significant increase in throughput when not having to call exists or create if exists; however, we do need the ability on create the file in the event it isn’t present similar to @lymedo.

Is there any update on this being resolved @amnguye ?

Also, @lymedo , with the try catch approach you have above, are you finding that your blob stays in “write-optimized” mode?

kylewarren avatar Apr 11 '23 03:04 kylewarren

@kylewarren I've observed a significant drop off in conflicts so the try/catch method has definitely improved things in my scenario. It's now only logging a failure when the block is first created; when it identifies that it doesn't exist.

lymedo avatar Apr 11 '23 04:04 lymedo

Thank you for the quick reply. I notice the same behavior once removing the check to see if the blob exists first. Shaved significant processing time. I notice the same behavior when creating as well, in a high concurrency scenario, I see a couple of attempts to create the blob on multiple threads initially, but then it goes away as we begin to append in other threads.

kylewarren avatar Apr 12 '23 00:04 kylewarren