amplify-js icon indicating copy to clipboard operation
amplify-js copied to clipboard

Storage Enhancement: provide pagination for returning records

Open mauerbac opened this issue 5 years ago • 9 comments

Problem A customer (https://github.com/aws-amplify/amplify-js/discussions/7084) found that S3 will return only 1000 records via the list API. While this is true there are options that we can add to our S3 Provider in order to get the NextKeyMarker from S3 when the data IsTruncated. Reference to the S3 SDK (https://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/S3.html#listObjects-property)

Solution

We can look to add additional parameters to our params object within our AWSS3Provider.ts (https://github.com/aws-amplify/amplify-js/blob/main/packages/storage/src/providers/AWSS3Provider.ts#L409) file. Along with adding a while() loop around our s3.send(...) call (https://github.com/aws-amplify/amplify-js/blob/main/packages/storage/src/providers/AWSS3Provider.ts#L418).

Output

  • Look to add additional parameters from S3 SDK
  • Create a while loop when the list is truncated that allows to keep ammending to the array.
  • Unit tests
  • Integ tests

References

  • https://github.com/aws-amplify/amplify-js/blob/main/packages/storage/src/providers/AWSS3Provider.ts#L395-L419
  • https://github.com/aws-amplify/amplify-js/discussions/7084
  • https://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/S3.html#listObjects-property

mauerbac avatar Nov 06 '20 20:11 mauerbac

Customers can already specify maxKeys as a config option which gives them flexibility of returning less than 1000 records if desired, I say we don't automatically while loop over the whole S3 bucket based on IsTruncated as it has a number of disadvantages & reduces the flexibility.

Instead, I propose adding marker as an option here https://github.com/aws-amplify/amplify-js/blob/5173a9911096627ce1b45067808af249668b260b/packages/storage/src/providers/AWSS3Provider.ts#L402 , and add NextKeyMarker, IsTruncated to L433 in the snippet below as StorageEvent attrs which are sent in a hub event. https://github.com/aws-amplify/amplify-js/blob/5173a9911096627ce1b45067808af249668b260b/packages/storage/src/providers/AWSS3Provider.ts#L430-L436

This way customers can have the same level of control as AWS.S3 in terms of paging. It also allows customers to lazy load records so the app can load the first 1,000 quickly and not have to wait for us to loop over the whole bucket before having access to any data. Another minor advantage is it will accommodate use-cases where the customer only wants the first say 3,000 records in a bucket of ten/hundred thousand. Or if a customer wants to lazy load 20 at a time as the user scrolls on the app, this solution can accommodate that too.

I feel this may be a cleaner solution and allows for parity between our S3 Provider and AWS.S3.

I do recognize the convenience to having something that loops over the whole bucket maybe that can be a helper function if we decide to implement it at all? We could probably get away with a 10 line snippet in the docs instead.

Feedback/suggestions/dissents welcome!


/cc @sammartinez from #7084

wei avatar Nov 08 '20 16:11 wei

Customers can already specify maxKeys as a config option which gives them flexibility of returning less than 1000 records if desired, I say we don't automatically while loop over the whole S3 bucket based on IsTruncated as it has a number of disadvantages & reduces the flexibility.

Instead, I propose adding marker as an option here https://github.com/aws-amplify/amplify-js/blob/5173a9911096627ce1b45067808af249668b260b/packages/storage/src/providers/AWSS3Provider.ts#L402

, and add NextKeyMarker, IsTruncated to L433 in the snippet below as StorageEvent attrs which are sent in a hub event. https://github.com/aws-amplify/amplify-js/blob/5173a9911096627ce1b45067808af249668b260b/packages/storage/src/providers/AWSS3Provider.ts#L430-L436

This way customers can have the same level of control as AWS.S3 in terms of paging. It also allows customers to lazy load records so the app can load the first 1,000 quickly and not have to wait for us to loop over the whole bucket before having access to any data. Another minor advantage is it will accommodate use-cases where the customer only wants the first say 3,000 records in a bucket of ten/hundred thousand. Or if a customer wants to lazy load 20 at a time as the user scrolls on the app, this solution can accommodate that too.

I feel this may be a cleaner solution and allows for parity between our S3 Provider and AWS.S3.

I do recognize the convenience to having something that loops over the whole bucket maybe that can be a helper function if we decide to implement it at all? We could probably get away with a 10 line snippet in the docs instead.

Feedback/suggestions/dissents welcome!

/cc @sammartinez from #7084

This is great @wei, I do have some questions on this approach. I am not seeing the marker value you are calling out contained with the S3 API reference documentation. Would this be a new value you are looking to introduce? Also, do you have a sample of this work we can take a look at?

sammartinez avatar Nov 11 '20 20:11 sammartinez

@sammartinez :wave: Hi Sam, it is listed in AWSJavaScriptSDK docs

Marker — (String) Specifies the key to start with when listing objects in a bucket.

MaxKeys — (Integer) Sets the maximum number of keys returned in the response. By default the API returns up to 1,000 key names. The response might contain fewer keys but will never contain more.

We are already sending MaxKeys in our AWSS3Provider: https://github.com/aws-amplify/amplify-js/blob/5173a9911096627ce1b45067808af249668b260b/packages/storage/src/providers/AWSS3Provider.ts#L409-L413

I haven't started development yet, I was analyzing the problem and coming up with a more optimal solution before jumping in to code.

I'd like to gather any feedback you may have and will work on a PR if my solution is approved 😄

Update We should consider using listObjectsV2 which follows a similar pattern, it uses ContinuationToken and NextContinuationToken instead of markers in V1. All the fields currently returned on s3 objects listings remain unchanged so it is backwards compatible. https://github.com/aws-amplify/amplify-js/blob/5173a9911096627ce1b45067808af249668b260b/packages/storage/src/providers/AWSS3Provider.ts#L422-L427

and the snippet would look like

let fullList = [];

let loadFullList = async (continuationToken) => {
  fullList = fullList.concat(await Storage.list('/', { continuationToken, track: true }));
}

Hub.listen('storage', ({ payload }) => {
  const { event, data } = payload
  if (event === 'list') {
    const { method, result, isTruncated, nextContinuationToken } = data.attrs;
    if (method === 'list' && result === 'success') {
      if (isTruncated) {
        loadFullList(nextContinuationToken)
      }
    }
  }
})

loadFullList()

It can be wrapped into a promise like: https://github.com/wei/aws-amplify-react/blob/c56e7ac9f81237d0da11050970d51b7553735c7f/src/App.js#L8-L35


Note to self

NextMarker When response is truncated (the IsTruncated element value in the response is true), you can use the key name in this field as marker in the subsequent request to get next set of objects. Amazon S3 lists objects in alphabetical order Note: This element is returned only if you have delimiter request parameter specified. If response does not include the NextMarker and it is truncated, you can use the value of the last Key in the response as the marker in the subsequent request to get the next set of object keys.

link

wei avatar Nov 11 '20 22:11 wei

Thank you @wei ! I missed this so thanks for calling this out! Let's look to discuss this a bit more over a call! Want to look to chat some more on it tomorrow or early Friday?

sammartinez avatar Nov 11 '20 22:11 sammartinez

@sammartinez PR opened #7183. 🎉

wei avatar Nov 13 '20 06:11 wei

@sammartinez I had a question/suggestion for an extension of the fix that I was discussing with Wei. I guess the question part is that is it common in the library to write custom react hooks.

I am asking this because I was telling wei that all of this functionality at least for react could be wrapper in one nice hook so the user doesn't need to use the Hub event to listen for the continuation token.

I think this would make the user experience for this feature much better. And to clarify this would be in addition to the current change not replacing it and would go in aws-amplify-react/Storage

CryogenicPlanet avatar Nov 13 '20 10:11 CryogenicPlanet

@sammartinez I had a question/suggestion for an extension of the fix that I was discussing with Wei. I guess the question part is that is it common in the library to write custom react hooks.

I am asking this because I was telling wei that all of this functionality at least for react could be wrapper in one nice hook so the user doesn't need to use the Hub event to listen for the continuation token.

I think this would make the user experience for this feature much better. And to clarify this would be in addition to the current change not replacing it and would go in aws-amplify-react/Storage

Hey @CryogenicPlanet! Thanks for the suggestion to this! I believe this could be a follow up to this implementation as this is separate from any framework that we support on the library. Think of it as an API more so than a framework implementation. I do want to point out that the reference to the aws-amplify-react library is version 1 of the UI Components and we are using version 2 @aws-amplify/ui-react now. This is where the implementation would need to live.

sammartinez avatar Nov 13 '20 16:11 sammartinez

The above change changes the Storage.list interface to allow accessing all objects in a list result set by adding a special maxKeys option.

Ex. Storage.list('',{maxKeys : 'ALL'})

maxKeys can be called with any value 1-1000 or 'ALL'.

A separate task is open to add token based pagination which we expect to include in the next major version release as a breaking change.

stocaaro avatar Jul 29 '22 23:07 stocaaro

Reopening pending the change that will come out in the next major version.

stocaaro avatar Aug 01 '22 16:08 stocaaro

Good news. Storage list pagination is now available on the latest aws-amplify release! See the Storage.list docs for help getting started using pagination.

Please review our breaking change guidance before upgrading your application.

To upgrade Amplify to version 5, run the following in your project folder:

yarn add aws-amplify@latest

stocaaro avatar Nov 14 '22 22:11 stocaaro