payload icon indicating copy to clipboard operation
payload copied to clipboard

[storage-gcs] Client Upload with Google Cloud Storage: File rewrite & metadata corruption

Open VeiaG opened this issue 3 months ago • 2 comments

Describe the Bug

Summary

When using @payloadcms/storage-gcs plugin with clientUploads: true for direct file uploads to Google Cloud Storage, the file upload process causes file corruption in GCS. After investigation and applying a workaround patch, we discovered:

  1. Without patch: File in GCS gets overwritten with corrupted data (21 bytes)
  2. With patch: File in GCS is preserved correctly, BUT metadata saved in Payload document is still incorrect

The Problem

What SHOULD happen:

1. Client uploads 37MB video/mp4 via signed URL → GCS stores it correctly ✅
2. Payload saves document with correct metadata:
   - size: 37000000 (bytes)
   - mimeType: video/mp4
3. Everything is correct ✅

What ACTUALLY happens (without patch):

1. Client uploads 37MB video/mp4 via signed URL → GCS stores it correctly ✅
2. Payload's beforeChange hook calls handleUpload ❌
3. handleUpload re-uploads the file with fake req.file data ❌
4. This 21-byte file OVERWRITES the correct 37MB file in GCS ❌
5. Document saves wrong metadata (from the fake req.file, for this corrupted file) ❌

Evidence: GCS Bucket Versioning

When versioning is enabled on the bucket, we can see both file versions:

  • Version 1 (initial client upload): 37 MB, video/mp4, correct content ✅
  • Version 2 (server re-upload): 21 bytes, text/plain, corrupted ❌

Temporary Workaround Applied

A patch was applied to @payloadcms/storage-gcs/dist/handleUpload.js to prevent the re-upload:

export const getHandleUpload = ({ acl, bucket, getStorageClient, prefix = '' }) => {
  return async ({ clientUploadContext, data, file }) => {
    // Skip if already uploaded by client
    if (clientUploadContext) {
      return data // File is already in GCS, no need to re-upload
    }

    const fileKey = path.posix.join(data.prefix || prefix, file.filename)
    const gcsFile = getStorageClient().bucket(bucket).file(fileKey)

    await gcsFile.save(file.buffer, {
      // ... rest of upload logic
    })

    return data
  }
}

What This Patch FIXES:

  • ✅ File in GCS is no longer overwritten with corrupted data
  • ✅ File remains with correct size (37MB) and content
  • size metadata in Payload document is correct (37000000 bytes)

What This Patch DOES NOT FIX:

  • mimeType in Payload document is still wrong: text/plain;charset=UTF-8 instead of video/mp4
  • ❌ This issue persists even after applying the patch

The Root Cause of Remaining Issue

Step 1: Client sends correct metadata

// Client sends POST request with JSON:
{
  "clientUploadContext": { ... },
  "collectionSlug": "media",
  "filename": "video.mp4",
  "mimeType": "video/mp4",      // ✅ Correct
  "size": 37000000              // ✅ Correct
}

Step 2: Payload creates fake req.file object

Location: packages/payload/src/utilities/addDataAndFileToRequest.ts:51-98

const { clientUploadContext, collectionSlug, filename, mimeType, size } = JSON.parse(fields.file)
// mimeType = "video/mp4" ✅
// size = 37000000 ✅

// Calls storage handlers to fetch file from cloud storage
for (const handler of uploadConfig.handlers) {
  const result = await handler(req, { ... })
}

// Creates req.file from handler response
req.file = {
  name: filename,
  clientUploadContext,
  data: Buffer.from(await response.arrayBuffer()),
  mimetype: response.headers.get('Content-Type') || mimeType,  // ❌ PROBLEM HERE
  size,  // ✅ Size is correct
}

Step 3: Wrong mimeType propagates to document

Location: packages/payload/src/uploads/generateFileData.ts:238, 252

else {
  mime = file.mimetype  // Gets "text/plain;charset=UTF-8" from req.file ❌
  fileData.filesize = file.size  // Gets 37000000 ✅ Correct

  // ...
}

fileData.mimeType = mime  // Saves "text/plain;charset=UTF-8" to document ❌

Investigation Details

The mimeType Problem

When response.headers.get('Content-Type') is called on the Response returned by the storage handler:

  • Expected: video/mp4 (from client)
  • Actual: text/plain;charset=UTF-8 (default/fallback value)

This suggests that either:

  1. The storage handler is not properly setting the Content-Type header in the Response
  2. The Response object doesn't preserve the header correctly
  3. The handler returns a Response without explicit Content-Type

The client DID send the correct mimeType value in the request, so falling back to response.headers.get('Content-Type') instead of trusting the client-provided value causes the wrong value to be used.

Why the Workaround Only Half-Works

The patch prevents file overwrite (Bug №1: file corruption in GCS), but doesn't address why the wrong mimeType is being stored in the Payload document (Bug №2: metadata corruption).

The size is correct because it comes directly from the fields.file JSON (line 52-54), not from response.headers.

But mimeType comes from response.headers.get('Content-Type'), which is unreliable for client uploads.


Remaining Issue to Fix

Even with the workaround patch applied, the mimeType metadata is still incorrect.

Current behavior (with patch):

File in GCS: ✅ 37MB video/mp4 correct
Payload document:
  - size: 37000000 ✅ Correct
  - mimeType: "text/plain;charset=UTF-8" ❌ Still wrong (should be "video/mp4")

Root cause analysis:

The issue is in packages/payload/src/utilities/addDataAndFileToRequest.ts:95:

mimetype: response.headers.get('Content-Type') || mimeType,
//        ↑ First priority: response header (unreliable)
//                                           ↑ Second priority: client-provided value (correct)

For client uploads, the response.headers.get('Content-Type') returns an unreliable value, while the mimeType parameter sent by the client is correct.

The priority should be reversed to trust the client-provided value when available:

mimetype: mimeType || response.headers.get('Content-Type'),
//        ↑ First priority: client-provided value (correct for client uploads)
//                         ↑ Second priority: response header (fallback)

Scope of This Issue

This issue has been confirmed with:

  • @payloadcms/storage-gcs with clientUploads: true

This issue has NOT been tested with:

  • ❓ Other storage adapters (S3, Azure, R2, Uploadthing, Vercel Blob)
  • ❓ Whether they have similar issues or different behavior

Files Involved

  • packages/payload/src/utilities/addDataAndFileToRequest.ts:51-98 - Creates fake req.file for client uploads
  • packages/payload/src/utilities/addDataAndFileToRequest.ts:95 - Sets mimeType with wrong priority
  • packages/payload/src/uploads/generateFileData.ts:238, 252 - Uses req.file.mimetype to populate document
  • packages/storage-gcs/src/handleUpload.ts - Re-uploads file without checking clientUploadContext
  • packages/plugin-cloud-storage/src/hooks/beforeChange.ts:13-66 - Calls handleUpload during document save

Impact

Without patch

  • Severity: Medium (file corruption is prevented with patch, but metadata is wrong)
  • Affected: Client uploads to cloud storage with clientUploads: true (at least GCS)
  • Data Loss: Full (if versions is not enabled on gcs ) , Partial (with patch) - file content is safe, but metadata is incorrect.

Workaround Status

  • Current: Patch applied to prevent file overwrite in GCS
  • Limitation: Metadata still incorrect
  • Next Step: Need Payload core fix to prioritize client-provided mimeType

Questions for Maintainers

  1. Should the mimeType priority be reversed in addDataAndFileToRequest.ts:95?
  2. Why does response.headers.get('Content-Type') return text/plain;charset=UTF-8 instead of the actual file MIME type?
  3. Should client uploads populate req.file differently to avoid using unreliable response headers?
  4. Should other storage adapters also check clientUploadContext before re-uploading?

Patch applied:

diff --git a/dist/handleUpload.js b/dist/handleUpload.js
index 3a2548e461df3b18a7578a4671ab6c8bf3028f3a..513dfc03aa15d3f900619e74af3d11c49b4c18a9 100644
--- a/dist/handleUpload.js
+++ b/dist/handleUpload.js
@@ -1,6 +1,11 @@
 import path from 'path';
 export const getHandleUpload = ({ acl, bucket, getStorageClient, prefix = '' })=>{
-    return async ({ data, file })=>{
+    return async ({ clientUploadContext, data, file })=>{
+        // Skip if already uploaded by client
+        if (clientUploadContext) {
+            return data;
+        }
+
         const fileKey = path.posix.join(data.prefix || prefix, file.filename);
         const gcsFile = getStorageClient().bucket(bucket).file(fileKey);
         await gcsFile.save(file.buffer, {
@@ -14,5 +19,4 @@ export const getHandleUpload = ({ acl, bucket, getStorageClient, prefix = '' })=
         return data;
     };
 };
-
 //# sourceMappingURL=handleUpload.js.map

Link to the code that reproduces this issue

https://github.com/VeiaG/payload-plugin-template-fix/tree/fix/client-upload-metadata-issue

Reproduction Steps

How to Reproduce

Prerequisites

  • Google Cloud Storage bucket with versioning enabled (to see both file versions)
  • Payload CMS with GCS storage plugin configured with clientUploads: true
  • A video file (10MB+)

Steps

  1. Without Workaround Patch:

    • Upload a video file via admin UI
    • Check GCS bucket: see TWO file versions
      • Version 1: 37MB, video/mp4 (correct)
      • Version 2: 21 bytes, text/plain (corrupted)
    • Check Payload document:
      • size: 37000000 (correct, at least i remember it was correct size)
      • mimeType: text/plain;charset=UTF-8 (wrong)
  2. With Workaround Patch:

    • Upload a video file via admin UI
    • Check GCS bucket: see ONE file version
      • Version 1: 37MB, video/mp4 (correct) ✅
    • Check Payload document:
      • size: 37000000 (correct) ✅
      • mimeType: text/plain;charset=UTF-8 (still wrong) ❌

Which area(s) are affected? (Select all that apply)

plugin: storage-*

Environment Info

Binaries:
  Node: 22.16.0
  npm: 10.9.2
  Yarn: N/A
  pnpm: 10.22.0
Relevant Packages:
  payload: 3.65.0
  next: 15.4.7
  @payloadcms/db-mongodb: 3.65.0
  @payloadcms/graphql: 3.65.0
  @payloadcms/live-preview: 3.65.0
  @payloadcms/live-preview-react: 3.65.0
  @payloadcms/next/utilities: 3.65.0
  @payloadcms/plugin-cloud-storage: 3.65.0
  @payloadcms/richtext-lexical: 3.65.0
  @payloadcms/sdk: 3.65.0
  @payloadcms/storage-gcs: 3.65.0
  @payloadcms/translations: 3.65.0
  @payloadcms/ui/shared: 3.65.0
  react: 19.1.0
  react-dom: 19.1.0
Operating System:
  Platform: darwin
  Arch: arm64
  Version: Darwin Kernel Version 24.6.0: Mon Jul 14 11:30:29 PDT 2025; root:xnu-11417.140.69~1/RELEASE_ARM64_T6000
  Available memory (MB): 16384
  Available CPU cores: 10

Hosted on VPS inside docker container.
node:23.11.0-alpine AS base

VeiaG avatar Nov 30 '25 20:11 VeiaG

Also another issue, but i didn't researched it , files in gcs bucket are not deleted, when i'm deleting items in my upload collection. Maybe will create another issue later ( or this is issue with my setup idk )

upd: not an issue, this is my own misconfiguration

VeiaG avatar Nov 30 '25 20:11 VeiaG

Sorry , at least this issue not that bad. Allmost all issues is from using GCS_ENDPOINT=storage.googleapis.com in my env file. Even deleting is working without it.

Image

I tested this with empty project, without providing apiEndpoint everything works ( proper mime type, file uploaded correctly etc)

But, file still uploaded twice, according to google cloud storage versioning:

Image

At least now , without providing apiEndpoint to configuration, second version is not broken, same as original file.

Again, sorry for the confusion - this turned out to be much less severe than I initially thought) Most of the unexpected behavior was caused by my own misconfiguration Still, I hope this clarification helps and sorry again for the noise.

VeiaG avatar Dec 01 '25 18:12 VeiaG