google-api-nodejs-client icon indicating copy to clipboard operation
google-api-nodejs-client copied to clipboard

Batch Request Documentation

Open Go2ClassPoorYorick opened this issue 4 years ago • 12 comments

Is your feature request related to a problem? Please describe. It has been an extremely frustrating process to find how to perform batch requests (in fact, I still haven't found how) because documentation around performing batch requests is all out of date (including the page saying that the old form of batch requests will stop working and to refer to client documentation).

Due to rate-limiting and the likes, it would be hugely helpful to be able to batch requests like file updates when moving many files from one folder to another. I'm likely going to have to piece this together as a raw HTTP request instead of an google api method, if that's even possible.

Describe the solution you'd like It would be nice to have clear, concise instructions on batching requests using this library or it should be clearly stated that it's not in the scope of this project. At the moment basically every thread about batching includes information that no longer works or just dies off without resolution.

Describe alternatives you've considered To be able to batch requests I'm likely going to have to perform a direct HTTP request to the API for file updates, or institute a timeout and process requests one-by one. Neither solution is optimal.

Additional context #513 #1130 includes information regarding http2, but no examples of "tight loops" or how this helps batch requests #121 includes reference to the deprecation of batch endpoints, but once more no concise documentation on how to accomplish this post global endpoint #740 once more suggests http2 is magically going to fix issues with batching, but suggest "per-api batch methods still exist" #290 , #338 include ideas/outdated information about batching.

Go2ClassPoorYorick avatar Sep 30 '20 16:09 Go2ClassPoorYorick

Please, I'm also struggling with this, I was looking at the code in the calendar API, and there is no batch method or anything remotely related. So I don't think "per API batch methods" applies.

cfficaurzua avatar Oct 03 '20 15:10 cfficaurzua

This appears to be out of date: https://developers.google.com/drive/api/v3/batch#node.js Hunting around led me to this: https://developers.googleblog.com/2018/03/discontinuing-support-for-json-rpc-and.html Under the explanation of HTTP/2 we see the word "batch": https://github.com/googleapis/google-api-nodejs-client#http2 And that linked me to this: https://github.com/googleapis/google-api-nodejs-client/issues/1130

It's quite a run-around to figure out what's going on.

paulschwarz avatar Oct 08 '20 18:10 paulschwarz

Judging by the amount of stale issues regarding batching, and the demand for it.. one can only assume they don't care and won't get to batching any time soon. I'll make an updated version of https://github.com/pradeep-mishra/google-batch as I need it for my project.

EricRabil avatar Oct 26 '20 17:10 EricRabil

Ok y'all, thanks for your patience (well some of you 😛). I will update the docs at some point after this, but to get folks unblocked...

Why batch requests exist

Batch requests are useful when you need to make a bunch of requests, in a short period of time. Specifically, they help ensure you can open a single TCP/IP connection, and make many requests over that single connection. This is especially helpful in environments like node.js, where it's easy to accidentally open too many network connections at once.

Batch requests still enforce quota rules, so you're not saving anything in the way of quota.

The deprecation of global batch

In a prior life, there was a global endpoint for all batch requests, letting you compose multiple requests to different services, all the same time. Turns out this has real scale and data protection issues, so these global endpoints that support multiple services at once were shut down: https://developers.googleblog.com/2018/03/discontinuing-support-for-json-rpc-and.html

What types of batch exist today

With global batch gone, now the responsibility falls to individual services. For example, the gmail API supports it's own version of batch: https://developers.google.com/gmail/api/guides/batch

The key differences here are:

  • Not every Google API supports batch requests natively
  • Requests now can only be batched for a single given API - for example, only gmail requests can be batched in one call, or only drive in another call - they cannot go together

Why it probably doesn't matter

So coming all the way back around to the original point - batch requests are good because they let you have a single TCP/IP connection, and reduce the overhead of establishing multiple connections. Good news everyone - this is exactly what HTTP/2 does: https://http2.github.io/faq/

HTTP/2 works by multiplexing (n) requests over a single TCP/IP connection, reducing overhead when you need to make many HTTP requests in a short burst. The other good news is that as far as I can tell, all Google APIs seem to support HTTP/2 natively :) And the bonus good news, is that this library supports it now too! https://github.com/googleapis/google-api-nodejs-client#http2

As of now - I don't think there's a significant benefit to supporting the somewhat bespoke format for batch requests natively in the library, as HTTP/2 gives us the same advantages, as far as I can tell, and already exists. So long as you're making all of your requests within the same 500ms window, to the same host - it shouldn't be an issue.

I've been looking for some feedback on the HTTP/2 implementation - I would love it anyone here would be willing to use the flag, do a little lightweight profiling, and let me know how this impacts your performance.

JustinBeckwith avatar Oct 27 '20 19:10 JustinBeckwith

Maybe I'm mistaken, but drive API alteast seems limited to about 10 req/sec even if I'm using http2 🤔 Would http2 even help avoid the req/sec spam protection they have? If not batch support seems appropriate imho 😅

asbjornenge avatar Nov 04 '20 17:11 asbjornenge

@asbjornenge can you share more about the requests per second limit you're hitting? Or the issues on spam protection? From what I can read here, there are specific response codes and descriptions that come back: https://developers.google.com/drive/api/v3/handle-errors#resolve_a_403_error_user_rate_limit_exceeded

Everything I've read says rate limits and quotas are the same regardless of batch vs individual API calls, but if you're finding something different please let me know!

JustinBeckwith avatar Nov 04 '20 17:11 JustinBeckwith

@JustinBeckwith Sure 😊 So I'm fetching a full folder tree that is about 222 (so alot, but nowhere near my 1000 req/100sec/user limit) folders in total with a depth of 5 on the deepest. I have a recursive loop trying to fetch these folders with 222 "simultaneous" (not really but) requests. As soon as I exceed 10 req/sec I hit the User Rate Limit Exceeded error.

I read somewhere (that I cannot find now) that all google APIs had a rate limit of 10req/sec to avoid spamming.

If we had batch support I could perform those queries on 3 requests (100+100+2) and should not have this issue.

Current code with issues:

async function getFullTreeRecursively(drive, id, children) {
  const q = `'${id}' in parents and trashed=false`
  const fields = 'files(id,name,mimeType)'
  const _children = await getListLoop(drive, q, fields)
  const __children = _children.map((child) => {
      let data = { name: child.name, id: child.id, mimeType: child.mimeType }
      if (child.mimeType === 'application/vnd.google-apps.folder') data.children = []
      return data
  })
  let childrenToLoop = []
  for (let child of __children) {
    children.push(child)
    if (Array.isArray(child.children)) {
      childrenToLoop.push({ id: child.id, children: child.children })
    }
  }
  let promises = childrenToLoop.map(async (child, index) => {
    return await getFullTreeRecursively(drive, child.id, child.children)
  })
  await Promise.allSettled(promises)
  return children
}

asbjornenge avatar Nov 04 '20 17:11 asbjornenge

Hmm, this is unfortunate to hear. I may be doing something wrong with the way I'm approaching the issue, but I'm creating an app that may copy anywhere from 10 to 1000's of files depending on a template scheme, and I'm not seeing any good way of approaching this outside an individual request per item- not only is this messy, but it seems to me to be a bad programming pattern.

I'm also not sure if this is intentionally anti-consumer (Can I even pay more to get a better rate-limit for my company?) but the lack of any mass object labeling/uploading/copying inherently neuters any attempt at creating projects that want to manage directory structures in the google drive- I shouldn't need to repeat the same API call with a different file number 1500 times to move files to a new folder, and this problem gets worse when you consider the rate limits and how conceptually slow they are.

Considering not only the silly repetition of submitting practically the same request 1500 times, add in rate limiting and you're looking at minutes of runtime to copy folder structures that it would take a local system seconds to copy/move. At my scale, that's not an amount of time that I couldn't work around, but I can't imagine trying to copy some file-structure that's 10's of thousands in size. Further, I know at this point I should have found a rate limiting library or created something myself, but it amazes me that a google-supported module doesn't inherently have a way of bulk-modifying data that's compatible with it's own rate limit.

Don't get me wrong, I appreciate the support and the fact that this library wouldn't exist without support from the community, but I can't be the only one who sees fault with the way we're being forced to do things. I just can't help but feel like I'm getting a very thin wrapper of the basic html API to begin with- It saddens me that there's no level of abstraction in the library (a query searching for folders returns a json object, instead of converting the response files into classes with "move" or "set___" options) that would inherently make this a significantly more powerful tool. At the moment, the apps-script api is significantly more powerful in that way.

I apologize if I come across as crass, but I suppose I was hoping for a drag-drop replacement of google app-script but in node so I could organize/schedule it in my own fashion.

Go2ClassPoorYorick avatar Nov 09 '20 20:11 Go2ClassPoorYorick

@JustinBeckwith Sure So I'm fetching a full folder tree that is about 222 (so alot, but nowhere near my 1000 req/100sec/user limit) folders in total with a depth of 5 on the deepest. I have a recursive loop trying to fetch these folders with 222 "simultaneous" (not really but) requests. As soon as I exceed 10 req/sec I hit the User Rate Limit Exceeded error.

I read somewhere (that I cannot find now) that all google APIs had a rate limit of 10req/sec to avoid spamming.

If we had batch support I could perform those queries on 3 requests (100+100+2) and should not have this issue.

@asbjornenge Justin explains at the top of his post that batching unfortunately doesn't save any api requests- a batch for 10 is equivalent to ten requests

Why batch requests exist

Batch requests are useful when you need to make a bunch of requests, in a short period of time. Specifically, they help ensure you can open a single TCP/IP connection, and make many requests over that single connection. This is especially helpful in environments like node.js, where it's easy to accidentally open too many network connections at once.

Batch requests still enforce quota rules, so you're not saving anything in the way of quota.

Not to beat a dead horse, but it does confuse me a small amount why rate limiting shouldn't conceptually be reduced from batching- the number of requests to the server drops, and the difference in payload size for multiple requests in one should still be significantly smaller than the overhead of processing the same set of headers and metadata 100 times, so I can't see the batch requests not reducing rate limiting be related to bandwidth.

I suppose actual I/O of the api calls could be a reason it it's own right, but somehow I don't see that being the case- 10 copy operations on a 10 gig file is going to have a storage impact significantly larger than changing the parent flag on 100 files in 10 seconds

I'm probably way off on the reasoning here, just hoping to understand the logic.

Go2ClassPoorYorick avatar Nov 09 '20 20:11 Go2ClassPoorYorick

So what's the consensus on this. Should we completely ditch request batching and make use of HTTP/2?

hayksaryan avatar Mar 15 '22 15:03 hayksaryan

No, please implement batch.

There are a handful of APIs that have optimizations around batch requests (Drive mostly) that significantly impact write throughput, such as when adding multiple permissions to a file. Yes, it's still the same # of API requests, but they're handled differently when batched in a way that allows for higher throughput than if they're made individually.

sqrrrl avatar May 16 '22 23:05 sqrrrl

I also have a question: Does this client library use gRPC when possible/by default – I read that RPC support by some APIs is deprecated now – or does it use HTTP by default/exclusively?

strarsis avatar Sep 12 '22 14:09 strarsis

Having hit the issue of batching Google apis requests using Node.js myself, I decided to develop a solution that follows the guidelines provided in Google's batching documentation, implementing the multipart/mixed HTTP request/response protocol.

Please have a look and do not hesitate to raise an issue if you find a bug or would like to discuss improvements.

jrmdayn avatar Jan 10 '23 14:01 jrmdayn