cloudpathlib
cloudpathlib copied to clipboard
S3: Use list_objects_v2 to list objects
The v1 of list_objects can be the cause of some consistency problems. See this answer for context: https://stackoverflow.com/a/67412931/1692709
We currently use list_objects for non recursive cases:
https://github.com/drivendataorg/cloudpathlib/blob/80f7afdf85dfb4f3ad0406944a5d3cf28c727435/cloudpathlib/s3/s3client.py#L147
We use the bucket filter in recursive cases: https://github.com/drivendataorg/cloudpathlib/blob/80f7afdf85dfb4f3ad0406944a5d3cf28c727435/cloudpathlib/s3/s3client.py#L136
We should replace both code paths with self.client.get_paginator('list_objects_v2'):
Here's the boto3 docs: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.list_objects_v2