Azurite icon indicating copy to clipboard operation
Azurite copied to clipboard

Content-MD5 should always be returned when listing blobs

Open JasonYeMSFT opened this issue 2 years ago • 5 comments

Which service(blob, file, queue, table) does this issue concern?

blob

Which version of the Azurite was used?

3.14.0

Where do you get Azurite? (npm, DockerHub, NuGet, Visual Studio Code Extension)

npm

What's the Node.js version?

16.13.1

What problem was encountered?

When listing blobs, it occasionally return blobs without a Content-MD5 property in the XML response if the MD5 is empty. Maybe having an empty MD5 value is not the exact cause but that's my observation so far.

Steps to reproduce the issue?

Upload an append blob to azurite instance with an empty MD5 value. Try to list the blob container which should return that blob.

In the response, the Content-MD5 property is missing.

Actual response:

<?xml version="1.0" encoding="utf-8"?>
<EnumerationResults ServiceEndpoint="" ContainerName="">
    <MaxResults>1000</MaxResults>
    <Delimiter>/</Delimiter>
    <Blobs>
        <Blob>
            <Name>blob</Name>
            <Properties>
                <Creation-Time>Thu, 11 Mar 2021 23:09:53 GMT</Creation-Time>
                <Last-Modified>Sat, 07 Aug 2021 00:19:39 GMT</Last…>
                <Content-CRC64 />
                <Cache-Control />
                <Content-Disposition />
                <BlobType>BlockBlob</BlobType>
                <AccessTier>Hot</AccessTier>
                <AccessTierInferred>true</AccessTierInferred>
                <LeaseStatus>unlocked</LeaseStatus>
                <LeaseState>available</LeaseState>
                <ServerEncrypted>true</ServerEncrypted>
            </Properties>
            <Metadata />
            <OrMetadata />
        </Blob>
    </Blobs>
    <NextMarker />
</EnumerationResults>

Expected response:

<?xml version="1.0" encoding="utf-8"?>
<EnumerationResults ServiceEndpoint="" ContainerName="">
    <MaxResults>1000</MaxResults>
    <Delimiter>/</Delimiter>
    <Blobs>
        <Blob>
            <Name>blob</Name>
            <Properties>
                <Creation-Time>Thu, 11 Mar 2021 23:09:53 GMT</Creation-Time>
                <Last-Modified>Sat, 07 Aug 2021 00:19:39 GMT</Last…>
                <Content-CRC64 />
                <Content-MD5/>   ------------------> This is missing
                <Cache-Control />
                <Content-Disposition />
                <BlobType>BlockBlob</BlobType>
                <AccessTier>Hot</AccessTier>
                <AccessTierInferred>true</AccessTierInferred>
                <LeaseStatus>unlocked</LeaseStatus>
                <LeaseState>available</LeaseState>
                <ServerEncrypted>true</ServerEncrypted>
            </Properties>
            <Metadata />
            <OrMetadata />
        </Blob>
    </Blobs>
    <NextMarker />
</EnumerationResults>

The public Azure blob service always returns the empty Content-MD5 in its list blob responses.

Have you found a mitigation/solution?

People can manually parse the response and handle that. However, the @azure/storage-blob sdk currently cannot handle this situation because it uses NodeJS standard library to process that property value which will throw if the value is undefined. Due to that, such blob can cause the SDK to not be able to list any blobs if the response contains those problematic blobs.

JasonYeMSFT avatar Jul 08 '22 17:07 JasonYeMSFT

@JasonYeMSFT

Per Blob API swagger , Content-Md5 is not a required item in blob properties, so it's expected that not return this property when server don't have value for it.

Would you please share which error you have meet and how to repro that? If sdk assume contentMD5 will be returned from server, this might be an SDK issue, since it doesn't aligned with swagger.

blueww avatar Jul 11 '22 09:07 blueww

OK, thanks for sharing that. The error is about Node's crypto API expecting a string like argument but receives undefined. The first argument must be of type string or an instance of Buffer, ArrayBuffer, or Array or an Array-like Object. Received undefined.

We encountered this issue while using AzCopy to upload blob to Azurite. AzCopy sends and initial Put Blob request to create an empty blob at the destination and in the headers set the initial content-md5 to empty string. It then iteratively make Put Block List APIs to append the actual contents and finally set the content-md5 to the expected value. If we try to list blobs in Azurite when the content-md5 is still empty, the list response will contain undefined for content-md5. The SDK will attempt to parse it and fail.

I'll open an issue on the SDK repo and link to this issue. Since the public Azure service is not skipping the Content-Md5, could you help double check if the API spec needs an update?

JasonYeMSFT avatar Jul 11 '22 17:07 JasonYeMSFT

could you help double check if the API spec needs an update?

[Wei] in the blob rest API doc, it says

The Content-MD5 element appears in the response body only if it has been set on the blob using version 2009-09-19 or later.

So I believe this is expected contentMD5 is not required.

blueww avatar Jul 12 '22 06:07 blueww

@JasonYeMSFT

After I discuss with JS SDK owner, looks this is a fixed JS SDK issue. Would you please check you are using the latest JS SDK?

blueww avatar Jul 15 '22 08:07 blueww

We are 1 version behind but I don't recall seeing anything related to that from the latest change log. I'll double check.

JasonYeMSFT avatar Jul 15 '22 18:07 JasonYeMSFT

hi guys is there any progress on this ?

DavidNorena avatar Sep 22 '22 06:09 DavidNorena

@DavidNorena This is an js SDK issue, since SDK not aligned with rest API doc. It should already been fixed in JS SDK.

blueww avatar Sep 22 '22 09:09 blueww

Close this issue as JS SDK is already fixed.

blueww avatar Sep 30 '22 07:09 blueww

Still having this issue with azurite v3.21.0, in VS Code only. Azure Storage Explorer can display blob containers without problems.

The first argument must be of type string or an instance of Buffer, ArrayBuffer, or Array or an Array-like Object. Received undefined

acmoune avatar Jan 22 '23 10:01 acmoune

I also still have this issue in 3.21.0.

DylanBruzenak avatar Feb 02 '23 17:02 DylanBruzenak

@acmoune , @DylanBruzenak

Please note the fix is in Js sdk, instead of Azurite.

Please confirm do you use the latest version of Azure Storage JS SDK https://www.npmjs.com/package/@azure/storage-blob?

blueww avatar Feb 03 '23 01:02 blueww

@blueww I am using azure-storage-blob 12.12.0. Just created the project last week. Manually calculating the hashes does fix it:

    const md5 = createHash('md5').update(blobContents).digest('hex')
    await blockClient.uploadStream(Readable.from(blobContents), blobContents.length, 5, {
      blobHTTPHeaders: { blobContentMD5: Buffer.from(md5, 'hex') },
    })

But I'd prefer the library handled this, as I'm not sure if there are unintended consequences working with this in production.

DylanBruzenak avatar Feb 08 '23 17:02 DylanBruzenak

@DylanBruzenak

Would you please confirm which is the issue you meet with JS SDK 12.12.0 + Azurite 3.21.0?

  1. You meet JS SDK failure as following in list blob, after upload a blob without set ContentMD5: The first argument must be of type string or an instance of Buffer, ArrayBuffer, or Array or an Array-like Object. Received undefined.
  2. You don't meet JS SDK failure in list blob, but just expected the list out blob should return ContentMD5 value (or an empty Content MD5 header?), even if you don't set ContentMD5 in upload blob?

For 1, Per what I get from JS SDK owner, this should not happen on 12.12.0. If it still happen, would you please share your JS SDK code to repro the issue (hide credential if any) and Azure Debug log? For 2, if user has not set ContentMD5 properties to a blob, the contentMD5 not return behavior is aligned with blob rest API doc, it says

The Content-MD5 element appears in the response body, only if it has been set on the blob by using version 2009-09-19 or later.

blueww avatar Feb 09 '23 02:02 blueww

I'm hitting #1. I cannot browse azurite after uploading a blob using the latest version of the vs code plugin (Azurite v3.21.0), js SDK 12.12.0, and mcr.microsoft.com/azure-storage/azurite:latest for the docker image. Manually calculating the md5 and uploading it fixes it.

This is the code (modified slightly to remove some intermediate layers and client code):

app.ts:

import { ContainerClient } from '@azure/storage-blob'
import express from 'express'
import multer from 'multer'
import { Readable } from 'stream'

const CONNECTION_STRING = 'connection_string_goes_here'

const uploadBlob = async (blobName: string, blobContents: Buffer) => {
  const client = new ContainerClient(CONNECTION_STRING, 'files')
  const blockClient = client.getBlockBlobClient(blobName)

  await blockClient.uploadStream(Readable.from(blobContents), blobContents.length)
}

const v1Router = express.Router()

const uploadStrategy = multer({ storage: multer.memoryStorage() }).single('file')

v1Router.post('/file', uploadStrategy, async (req, res) => {
  const fileName = req.file?.originalname
  const fileBuffer = req.file?.buffer

  if (!fileName || !fileBuffer) {
    res.status(400).send('Missing fileName or fileBuffer')
    return
  }

  await uploadBlob(fileName, fileBuffer)

  res.end()
})

const app = express()

app.use('/v1', v1Router)

app.listen(3000, () => {
  console.log(`Listening on 3000`)
})

package.json:

{
  ...
  "dependencies": {
    "@types/express": "^4.17.16",
    "@types/multer": "^1.4.7",
    "@types/node": "18.11.10",
    "express": "^4.18.2",
    "multer": "^1.4.5-lts.1",
    "ts-node": "10.9.1",
    "typescript": "4.9.5",
  "@azure/identity": "^3.1.2",
  "@azure/storage-blob": "^12.12.0",
  },
}

You can curl this: curl -F 'file=ABSOLUTE_FILE_PATH' "localhost:3000/v1/file"

After that browsing to this file in vscode displays the error string.

Changing uploadBlob to the example from my last post fixes it.

DylanBruzenak avatar Feb 10 '23 20:02 DylanBruzenak

@DylanBruzenak

How do you "browsing to this file in vscode"? As I know, the fix in JS SDK is: SDK won't fail when list blob when blob not has ContentMD5.

So it looks the problem is which app/sdk you use to "browsing to this file in vscode"? Do you use the latest app/SDK?

blueww avatar Feb 14 '23 06:02 blueww

@blueww I am browsing to the file in the Azurite v3.21.0 VS Code plugin on the left bar. I am writing the file with the latest version of the SDK.

DylanBruzenak avatar Feb 14 '23 18:02 DylanBruzenak

@DylanBruzenak

I don't quite understand what do you mean by "browsing to the file in the Azurite v3.21.0 VS Code plugin on the left bar". Azurite doesn't provide a browser to browse blobs in Azurite.

I think you might use other tools in VS code. Could you give the detail steps you use to browser the file (hide credential if any)?

blueww avatar Feb 15 '23 02:02 blueww

Microsoft provides a few visual studio code plugins for azurite. https://marketplace.visualstudio.com/items?itemName=Azurite.azurite and https://marketplace.visualstudio.com/items?itemName=ms-azuretools.vscode-azurestorage. I'm assuming this is a bug with the vscode-azure storage and I should open this over there ? I hadn't realized that was a different project. It's odd that the way you write the blob with the sdk fixes or breaks this.

DylanBruzenak avatar Feb 15 '23 03:02 DylanBruzenak

@DylanBruzenak

Azurite is an emulator of Azure Storage server side, Azurite don't provide a client tool to browse the data in Azurite, you must use some other Azure Storage tools/SDK to browse the data in Azurite.

It looks you are using the Storage Explorer in VS code extension, and I can repro this issue with this extension. (But I can't repro this issue with Storage Explorer desktop app https://azure.microsoft.com/en-us/products/storage/storage-explorer) So it looks the issue is caused by Storage Explorer in VS code extension has not updated to latest JS SDK. You can contact this extension owner team to see when can they upgrade to latest JS SDK. (one way is to file an issue in https://github.com/Microsoft/vscode-azurestorage)

For why the issue is fixed in SDK side, this is caused optional property Content-MD5 not return is aligned with swagger/rest API design. But JS SDK failed when the optional property not return, so cause this issue. Then JS SDK fixed it, so won't fail even the optional contentMD5 not return from server.

blueww avatar Feb 15 '23 06:02 blueww