upchunk icon indicating copy to clipboard operation
upchunk copied to clipboard

GCS chunked request overwrites existing file instead of appending to it

Open mtbohm opened this issue 1 year ago • 2 comments

We are using UpChunk to upload files into a Google Cloud bucket, using a signed URL generated by an external backend. This works perfectly, as we can see the uploaded files appear in the GCP bucket. However, when the filesize exceeds the configured chunk-size, the library makes several requests to the GCS upload URL (as expected), but after the chunked requests are done, only the latest chunk is stored in the GCS bucket. It seems like each PUT request containing a single chunk overwrites the previous one.

To illustrate, we set the chunk size to 4 MiB:

const upload = UpChunk.createUpload({
  endpoint: res.signedUrl,
  file: file,
  chunkSize: 4096, 
});

And then upload a file of size ~5.2 MiB. The first chunk's PUT has these request headers:

Screenshot 2024-07-17 at 23 27 34

And the second chunk has:

Screenshot 2024-07-17 at 23 27 43

But then after both requests are completed, the file in the GCS bucket looks like this:

Screenshot 2024-07-17 at 23 27 52

We've confirmed that, if you look in GCS after the first request is done, before the second one completes, the file is actually 4 MiB. This seems to confirm that the file is being overwritten, and not appended to.

Having gone through both Google Cloud's documentation, as well as UpChunk's docs, there is nothing that seems to indicate what might be causing this, or how to configure it to append instead of overwrite. Any thoughts would be greatly appreciated!

mtbohm avatar Jul 17 '24 21:07 mtbohm

Hi there! This is a weird one I'll confess I haven't seen... This might be totally off base, but can you confirm which GCS upload type you're using? This library assumes you're using Resumable Uploads, and a mismatch there is the only thing that immediately comes to mind for what could be going on here.

mmcc avatar Jul 17 '24 22:07 mmcc

Indeed, we are using resumable uploads. The signed URL that we use in the createUpload function takes this format (some values changed and obscured):

https://storage.googleapis.com/BUCKET_NAME_OBSCURED/tnq7h8crJn-KayBIouGq4?X-Goog-Algorithm=GOOG4-RSA-SHA256&X-Goog-Credential=assets%40svc-acc-obscured.iam.gserviceaccount.com%2F20240718%2Fauto%2Fstorage%2Fgoog4_request&X-Goog-Date=20240718T095653Z&X-Goog-Expires=599&X-Goog-Signature=956b59830e8d16d84c2ff813d700178d7da906dd9f27f1da3efd38bc705a96b68731dbea0b5dcc2a66adcc6fb67ceb432f147ceb4a4e3f02addbca82021357ee99efadf0de789de5ab41219aa96eb63566296c2f71959ddbb1ca528266c83a8e1d6f5fb6586f34e2b12ee4255690bfb652ad59b205f7c27e0ed79ceec1087b0f3809f8356849138685b3e497f875da0ebdb95cdc8b6f89cdb6e58a208c41952dca499530ad0cba2db808321dc4ede19ebf2490a9415ee3fe0f113eb77a59051c01d7a0ba298b7e699897b2113fbb1555fa63a8849ee2dc9ff84d19bac94e94b653614d4aa5250746253d42470cd63ba00e77a89a5fae92d6755018b1de3c9a49&X-Goog-SignedHeaders=content-type%3Bhost&uploadType=resumable

And we can confirm that uploadType=resumable was also included in the POST request to GCS to generate this URL (using the GCS golang library). So from what I can tell the URL is formatted correctly and the necessary parameters are there.

mtbohm avatar Jul 18 '24 10:07 mtbohm