nodejs-storage icon indicating copy to clipboard operation
nodejs-storage copied to clipboard

Support batch requests

Open rhodgkins opened this issue 5 years ago • 15 comments

I've had a look around but I don't think this library uses the batch endpoint. It looks like its supported in other languages such as Java, but not node. Any plans to implement it?

Also from looking at the docs its not completely clear, but is object copying supported in a batch request? (The Java docs don't have a copy method from what I can see)

rhodgkins avatar Jan 22 '20 14:01 rhodgkins

We had a request for this a couple of years ago, and we turned away from it because only a few operations allowed batching (https://github.com/googleapis/google-cloud-node/issues/2457). Specifically, file uploads and downloads aren't batch-able, only metadata changes and deleting files.

My only guess for why copy wouldn't be available is because it creates a new file, which would be similar to uploading, but a concrete answer on that would be great.

@frankyn what do you think?:

  • Should our API introduce batching support?
  • Is there a list of remote operations that support batching?

stephenplusplus avatar Jan 22 '20 15:01 stephenplusplus

@stephenplusplus thanks for the quick reply.

Even if copy wasn't supported, it could still be used to delete a large number of files... Number of API requests aside and if there was a batch node API would it be quicker / better / preferable if you used bucket.deleteFiles (which does a .getFiles and then a delete on each in batches of 10) or bucket.getFiles then do a batch delete on the files? (This would be from inside a Google Cloud (Firebase) function) I'm assuming batch would be better but not going to bother looking at implementing it if its not worth it..!

rhodgkins avatar Jan 22 '20 16:01 rhodgkins

What about batching for uploading large numbers of small files? this seems to be probably the main reason for doing so and can make an enormous performance difference. This is supported by gsutil command already.

"Each Cloud Storage upload transaction has some small overhead associated with it, which in bulk scenarios, can quickly dominate the performance of the operation. For example, when uploading 20,000 files that are 1kb each, the overhead of individual upload takes more time than all the entire upload time altogether. This concept of overhead-per-operation is not new, nor is the solution: batch your operations together"

Source: https://cloud.google.com/blog/products/gcp/optimizing-your-cloud-storage-performance-google-cloud-performance-atlas

gWOLF3 avatar Jan 25 '20 18:01 gWOLF3

@gWOLF3 the batch API we’re talking about here doesn’t support file creation

rhodgkins avatar Jan 25 '20 19:01 rhodgkins

What batch api is this... is there a batch api which does support file creation?

How is this implemented by gsutil? https://github.com/GoogleCloudPlatform/gsutil/blob/2bab315919b4aba8b2a95732571396803c1776db/gslib/commands/cp.py

I am also curious about parallel composite uploads for large files.

gWOLF3 avatar Jan 25 '20 19:01 gWOLF3

Same problem as @gWOLF3. We have to upload more than 1000 html files simultaneously and the performance is awful.

davidspiess avatar Apr 23 '20 11:04 davidspiess

Batch requests in this case is specifically for the Storage Batch API: https://cloud.google.com/storage/docs/json_api/v1/how-tos/batch

Helps with:

  • Updating metadata, such as permissions, on many objects.
  • Deleting many objects.

I think you're looking more for performent parallel uploads, is that right?

frankyn avatar Apr 23 '20 16:04 frankyn

I think you're looking more for performent parallel uploads, is that right?

Yes exactly. I know that currently the storage batch API doesn't support batch operations for up- and downloads, so i'm wondering if there is any plan to support that eventually. I found the following feature request https://issuetracker.google.com/issues/142641783 but i'm not sure if there is any progress beeing made. Further In my research i stumbled upon this video https://www.youtube.com/watch?v=oEto_3jr1ec, which suggests using composite objects to speed up things (https://goo.gle/2mh4Ei0 ). I didn't try that suggestion yet, but it seems to complicate things quite a bit.

davidspiess avatar Apr 23 '20 17:04 davidspiess

Adding my 2 cents: I'd like support for this in order to improve the performance of deleting a large number of files.

Currently I have to do something like this

const bucket = firebaseAdmin.storage().bucket();
const promises = files.map((filePath) => {
  return bucket
    .file(filePath)
    .delete()
    .catch((e) => console.log("error deleting file", e.message));
});
await Promise.all(promises);

Ideally, I could just call bucket.deleteFiles(files)

arcticmatt avatar Apr 03 '21 03:04 arcticmatt

I’d forgotten about this issue...

I think you're looking more for performent parallel uploads, is that right?

Ideally yes but as the batch API doesn’t support that I’d like to see batch deletions please!

rhodgkins avatar Apr 03 '21 08:04 rhodgkins

Adding my 2 cents: I'd like support for this in order to improve the performance of deleting a large number of files.

Ideally, I could just call bucket.deleteFiles(files)

I've got the exact same use-case and would love to be able to use the batch API to delete multiple files.

amitbeck avatar Sep 14 '21 10:09 amitbeck

This issue is duplicated and now being tracked by https://github.com/googleapis/nodejs-storage/issues/1868

shaffeeullah avatar Apr 13 '22 17:04 shaffeeullah

@shaffeeullah not sure if that is a duplicate - only mentions uploads / downloads and copying

rhodgkins avatar Apr 13 '22 17:04 rhodgkins

@rhodgkins Good callout. This issue includes things like delete which are supported by cloud storage (https://cloud.google.com/storage/docs/batch#overview). We'll look into it.

shaffeeullah avatar Apr 13 '22 18:04 shaffeeullah

This would be super useful for bulk deletions!

marco2216 avatar Apr 19 '22 14:04 marco2216

The current guidance for bulk deletions is to leverage OLM. Further additions for batch operations are unlikely to be implemented and as such I am going to close this issue.

ddelgrosso1 avatar Jul 17 '23 16:07 ddelgrosso1