dynamo icon indicating copy to clipboard operation
dynamo copied to clipboard

Is there a no-internal-pagination mode for Scan/Query?

Open extemporalgenome opened this issue 11 months ago • 0 comments

Hi! I have workloads that are simpler to reason about and control if there were a way to run Scan/Query in this library such that:

  1. Exactly one Scan/Query call in this library produces exactly one Scan/Query call in the underlying SDK. To scan subsequent pages, a new guregu Scan/Query invocation must be made.
  2. I do not need to set a SearchLimit or Limit, and can rely on query results being bounded by DynamoDB's 1MB upper bound on single-response result set size.
  3. All*, Next*, etc all work, except that they do not result in further implicit fetches. Instead, they'll stop, e.g. at that 1MB boundary if no other limit is reached first.
  4. The LastEvaluatedKey is exactly the one that the SDK provides, and not one computed based on an intra-page result.

Rationale:

  1. The code looks like an AllWithContext, passed a slice, will OOM unless a Limit is set: it looks like it'll keep fetching forever on a sufficiently large table.
  2. Needing to set a SearchLimit to prevent buffering the whole table is awkward and less efficient, since we are removing the DynamoDB endpoint's ability to just maximize RCUs. i.e. 1MB of results will use no more than 125 RCU, but an arbitrary SearchLimit of 1000 may hypothetically do something like fetch a full 1MB page (very efficient) and then fetch 3 more results on the next page (very inefficient), and may continue that uneven walk over the whole table.
  3. Fine-grained rate limiting of RCUs is far simpler with more explicit control over requests (i.e. ask for 125 rate limit tokens because I'm ensured I'll be fetching no more than 1MB over no more than 1 request).

Is there presently such a capability to do the above using this library?

If not, would you be open to it being added? It seems like the simplest approach is some kind of RequestLimit(1) method call that can be used to achieve the above, potentially with a mechanism to reset it (ResetRequestCounter()) in order to reuse the constructed Scan/Query, while providing the application control over admitting requests.

extemporalgenome avatar Mar 06 '24 16:03 extemporalgenome