connect icon indicating copy to clipboard operation
connect copied to clipboard

Support Batching in the Azure Blob Storage output

Open leosunmo opened this issue 4 years ago • 3 comments

There's (probably) no real reason why it couldn't support batching just as well as the S3 output does.

Godocs for uploading a Block Blob: https://pkg.go.dev/github.com/Azure/azure-storage-blob-go/azblob#BlockBlobURL.Upload

It mentions that you can't update existing Block Blobs, and that you can stage Blobs before you commit them to the final Block Blob.

To perform a partial update of a block blob, use StageBlock and CommitBlockList

This could probably be used to upload large files more efficiently? Not sure it's worth it since Benthos is probably more likely to pass many small to medium sized payloads rather than few large ones.

Microsoft docs for Put Blob: https://docs.microsoft.com/en-gb/rest/api/storageservices/put-blob

leosunmo avatar Sep 23 '21 11:09 leosunmo

@leosunmo Thank you for opening this issue! The initial implementation of this output made use of https://github.com/Azure/azure-sdk-for-go, which was labeled as deprecated, but due to a lack of connection string support in azure-storage-blob-go, we decided to still use the deprecated library. However, there was an update here where, apparently https://pkg.go.dev/github.com/Azure/azure-sdk-for-go/sdk/storage/azblob is what we should be using. I asked them for details, but it's been radio silence, so I guess I'll have to dig into it and see if it supports everything we need. I'll have a look at it soon and try to update Benthos to use the new lib.

Note to self: #615 should be addressed as part of this.

mihaitodor avatar Sep 23 '21 15:09 mihaitodor

Wow, didn't realise there were multiple versions of the Blob SDK. That looks like a bit of a mess :grimacing: I especially love how even the deprecated package is in preview still.

leosunmo avatar Sep 25 '21 01:09 leosunmo

Small update: Looks like the azblob stuff was migrated over to the ~legacy~ current sdk and now resides here: https://github.com/Azure/azure-sdk-for-go/tree/main/sdk/storage/azblob#readme We'll have to revisit the Azure Storage input and output and switch them to new API, which should allow us to leverage the batching functionality.

mihaitodor avatar Apr 16 '22 14:04 mihaitodor