nodejs-storage
nodejs-storage copied to clipboard
CONTENT_DOWNLOAD_MISMATCH with successful file download
Environment details
- OS: Windows 10 Pro 1809
- Node.js version: v10.15.3 LTS
- npm version: 6.4.1
-
@google-cloud/storage
version: 2.5.0
Steps to reproduce
- Prepare a bucket with Cloud KMS Customer managed Key encryption. All files are private.
- Upload an JPEG image file using:
function UploadFile() {
storage.bucket("somebucket").upload("somefolder/someimage.jpg", {
gzip: true,
destination: "somefolder/someimage.jpg",
}).then(results => {
console.log("upload OK");
})
.catch(err => {
console.error("Error: ", err);
});
}
UploadFile()
// Upload works fine.
- Try downloading the img file using:
function DownloadFile() {
const file = storage.bucket("somebucket").file("somefolder/someimage.jpg");
file.download({
destination: "someimage.jpg",
// validation: false, // Why even disable validation?
}).then(res => {
console.log("DL OK");
}).catch(err => {
console.error(err); // Throws CONTENT_DOWNLOAD_MISMATCH
})
}
DownloadFile()
-
Correct decompressed file is downloaded AND exception is thrown:
code=CONTENT_DOWNLOAD_MISMATCH message=The downloaded data did not match the data from the server. To be sure the content is the same, you should download the file again.
I've read through Issue 566, but seems like not a solution.
storage.bucket("somebucket").file("somefolder/someimage.jpg").download({validation: false});
works, but there's no reason to or should disable validation.
To make sure if the hashes are actually a mismatch, I ran a local md5sum check on the downloaded and original image files.
$ md5sum downloaded.jpg original.jpg
a045a2e8e6b8d84aa8a319bcdba05419 downloaded.jpg
a045a2e8e6b8d84aa8a319bcdba05419 original.jpg
Downloaded file is a match to the original file.
BTW, This problem doesn't happen if the image is not gzipped on upload.
Thanks
Validation error because of crc32c check failure. https://github.com/googleapis/nodejs-storage/blob/61eeb64a6bca194361aacf2a312d27d0e6d63b35/src/file.ts#L1372
- On upload, storage client calculates crc32c after gzip compression. Google storage service is also calculating crc32c on gzip data and send back crc32c value in upload response for validation.
- While on download google storage client receiving uncompress data but header
x-goog-hash
has the crc32c value of gzip compressed data. (same as generated on upload ). Here crc32c value is mismatching because the client is calculating the crc32c value on uncompressed data.
Same error if validation set to md5
const file = storage.bucket("Bucket Name").file("file path on google storage");
file.download({
destination: "downloaded.jpg",
validation: 'md5'
}).then(res => {
console.log("DL OK");
}).catch(err => {
console.error(err);
})
So if gzip response header not found in download response then we should ignore crc32c check on uncompressed data.
@taiyokato I have reproduced this when using already-compressed content, specifically a PNG. In your example, you use a JPEG, which is also already compressed. The Storage docs advise against this due to "undesired behaviors" (see Using gzip on compressed objects).
In this case, the server is responding with the actual image contents without the gzip wrapper. We don't run the decompression, because it isn't compressed data we are receiving. Validation still runs, however, and that means the hashes we compute are not against the gzipped data value, but the actual data.
@jiren Currently, we run validation on all downloads. Can you clarify the conditions we should bypass the validation?
@stephenplusplus
Ah, that clears the mystery. Thanks for looking into it (and the link).
Should I keep this issue open or close it?
Here is the current condition. https://github.com/googleapis/nodejs-storage/blob/71a4f59343bbe9b0c00ebdf68f6fdf9d214727cc/src/file.ts#L1300-L1303
This condition can be change as per data is compress or not. If the gzip header is present then only validate CRC and MD5.
if (shouldRunValidation && isCompressed) {
validateStream = hashStreamValidation({ crc32c, md5 });
throughStreams.push(validateStream);
} else {
crc32c = false
}
Same thing is implemented in go storage client.
if length != 0 && !res.Uncompressed && !uncompressedByServer(res) {
https://github.com/googleapis/google-cloud-go/blob/71971b35976fc2f904ed2772536790a5458d9996/storage/reader.go#L205
@AVaksman I'll hand this over to you for now, as I believe this is blocked on the outcome of your investigation with the Storage team: https://docs.google.com/document/d/1jz91hfD5AJ8ghXnTPSnCuBa4NdkOPq085y9hBBQFu8M. I'm happy to do any implementation work if necessary when the time comes.
@AVaksman is there a game plan for how to address this?
@AVaksman @stephenplusplus - this bug has just caused us an issue in our production environment which we have had to workaround, any eta on a fix?
I think what we'd like to end up doing here is skipping validation if the file has the metadata Content-Encoding: gzip
and the file is served decompressed by the server.
We would insert that logic after the getMetadata()
call, which is when the client makes a second API call to get the checksums from the server.
@frankyn Does this temporary fix make sense?
That sounds like a good plan in the mean time.
Please move forward it and add a note that this is only temporary.
Opening this so we track the other part of the fix. Will mark as external as it won't likely happen in this repo.
Any eta on a fix for this? Upgraded to 4.3.1 but see the same issue.
We believe the issue should have been fixed with 4.3.1, so that's weird that it's still showing up for you @danielwhatmuff. I'd like to get more information -
Could you help us by providing:
- sample code that reproduces the issue
- the file type that's causing the issue?
I appreciate your patience and working with us to resolve this!
@frankyn do you know where this issue stands? I don't think there was anything left for us to do from this library.
I'm running into this with some uploaded .png and .jpegs. I do not GZIP on upload at all. I tried the MD5 check to see if that cleared it up, but it didn't. The odd thing is that it only breaks when opening multiple read streams for different files to zip them. When I attempt to read the one file in a separate file, it works. For the life of me, the two sets of code appear the same.
We are still working with the backend team to resolve this issue correctly. We are working on a workaround and will post an update when available.
Is there an update on this? The issue is affecting us.
News leading to closure of this issue: As of end of this week Cloud Storage API will always respect Accept-Encoding: gzip
.
I'm closing this issue as we can now safely validate checksum because GCS respects not decompressing data server side before sending back to customers.
Thanks for your patience
Spoke to soon, Rolled back the change so we will need to follow-up again when we have an update. Apologies, I jinxed it.