boto3
boto3 copied to clipboard
s3control mangles TargetLocationPrefix with key, making its behavior unexpected.
Describe the bug
When using s3control to copy object, the TargetLocationPrefix is appened with the full object key, instead of just the name of the file.
Documentation for s3control in boto3 was rather lacking in detail. But in cli it says
Specifies the folder prefix into which you would like the objects to be copied. For example, to copy objects into a folder named Folder1 in the destination bucket, set the TargetKeyPrefix to Folder1 .
However when I tried this... I would get s3://test_bucket/Folder2/Folder1/test_object.txt instead of s3://test_bucket/Folder2/test_object.txt
Expected Behavior
A file in location s3://test_bucket/Folder2/test_object.txt
Current Behavior
A file in location s3://test_bucket/Folder2/Folder1/test_object.txt
Reproduction Steps
import json
import time
import uuid
from typing import List, Tuple, Union
from collections import defaultdict
import boto3
# print(all_keys)
def get_partition():
if boto3.session.Session().region_name in ["cn-northwest-1", "cn-north-1"]:
return "aws-cn"
else:
return "aws"
partition = get_partition()
region_name = boto3.session.Session().region_name
account_id = boto3.client('sts').get_caller_identity().get('Account')
s3 = boto3.client("s3", region_name=region_name)
s3control = boto3.client('s3control', region_name=region_name)
def copy_objects_to_location(objects: List[Union[str, Tuple[str, str]]], dst_bucket, dst_prefix):
if dst_prefix[-1] != "/":
dst_prefix += '/'
batchOperationsRole = f"arn:{partition}:iam::{account_id}:role/ml-ops-S3BatchOperations" # WARNING: create a role for this.
# boto3.set_stream_logger('')
manifest_csv = ""
for key in objects:
if isinstance(key, str):
bucket = key.split('/')[2]
prefix = '/'.join(key.split('/')[3:])
else:
bucket, prefix = key
manifest_csv += f"{bucket},{prefix}\n"
# print(manifest_csv)
# upload the manifest file to s3://{src_bucket}/{src_prefix}/manifest.csv
manifest_key = f"s3_control_manifests/manifest_{uuid.uuid4()}.csv"
manifest_uri = f"s3://{dst_bucket}/{manifest_key}"
manifest_arn = f"arn:{partition}:s3:::{dst_bucket}/{manifest_key}"
response = s3.put_object(
Body=manifest_csv,
Bucket=dst_bucket,
Key=manifest_key
)
etag = boto3.resource('s3').Object(dst_bucket, manifest_key).e_tag.strip('"')
print(f"Uploaded manifest to {manifest_uri}.")
dst_bucket_arn = f"arn:{partition}:s3:::{dst_bucket}"
kwargs = {
"AccountId": account_id,
"ConfirmationRequired": False,
"RoleArn": batchOperationsRole,
"Priority": 10,
"Manifest": {
"Spec": {
"Format": "S3InventoryReport_CSV_20161130",
"Fields": ["Bucket", "Key"],
},
"Location": {
"ObjectArn": manifest_arn,
"ETag": etag
},
},
"Operation": {
'S3PutObjectCopy': {
"TargetResource": dst_bucket_arn,
'TargetKeyPrefix': dst_prefix,
"MetadataDirective": "COPY",
"RequesterPays": False,
"StorageClass": "STANDARD",
},
},
"Report": {
# 'Bucket': dst_bucket_arn,
# 'Format': 'Report_CSV_20180820',
'Enabled': False,
# 'Prefix': dst_prefix,
# 'ReportScope': 'AllTasks',
},
"Tags": [{
"Key": "cost_batch_operation",
"Value": "yolo_data_something"
}]
}
print(json.dumps(kwargs, indent=4))
job_id = s3control.create_job(**kwargs)['JobId']
status = None
while status not in ['Complete', 'Cancelled', 'Failed', 'Suspended']:
response = s3control.describe_job(
AccountId=account_id,
JobId=job_id
)
status = response['Job']['Status']
print('status', status)
print(response)
time.sleep(10)
if status in ['Cancelled', 'Failed', 'Suspended']:
raise Exception(f'Jobid {job_id} has error status of {status}')
copy_objects_to_location([('test_bucket','Folder1/tesst.txt'), 'test_bucket', 'Folder2']) # copies to Folder2/Folder1/test.txt
Possible Solution
Either TargetKeyPrefix should be the parent key path + the name of file or there needs to be a boolean that enables standard s3 cp behavior such as S3CPBehavior:true
Additional Information/Context
No response
SDK version used
1.24.74
Environment details (OS name and version, etc.)
python 3.9
Hi @Anton-Velikodnyy thanks for reaching out. Here is the boto3 create_job documentation for reference. The service API documentation models are shared across the CLI and SDKs so that note on TargetKeyPrefix
is also in the boto3 docs.
I'm not sure if the behavior you're describing is inconsistent with the documentation. S3 uses the concept of folders as a grouping of objects under a shared prefix (this is described in more detail here). So in this case it looks like a Folder2 prefix is just getting added ahead of a Folder1 prefix rather than substituting the prefixes.
Thanks for responding
So in this case it looks like a Folder2 prefix is just getting added ahead of a Folder1 prefix rather than substituting the prefixes.
This behavior seems rather strange... since aws cli doesn't follow this approach.
If i did something like the following with the cli
aws s3 cp s3://test_bucket/Folder1/test_object.txt s3://test_bucket/Folder2/
I would get s3://test_bucket/Folder2/test_object.txt
as a result.
Appending key to another key seems rather unusable. I can't think of any real world examples where you would want that behavior. And if you did, the appropriate implimintation (follow aws-cli's pricinples) would allow that behavior if you really wanted to with something like aws s3 cp s3://test_bucket/Folder1/test_object.txt s3://test_bucket/Folder2/Folder1/
Hi @Anton-Velikodnyy yes I agree on the aws s3 cp
behavior. I'm not as familiar with S3Control but as you noted the documentation on TargetKeyPrefix
says:
TargetKeyPrefix (string) --
Specifies the folder prefix into which you would like the objects to be copied. For example, to copy objects into a folder named Folder1 in the destination bucket, set the TargetKeyPrefix to Folder1 .
Do you have a minimal reproducible example that contradicts that description? It looks like there is some modification of dst_prefix
variable you're using that may be a factor here.
Not a reproducible example, but quote from what you mentioned earlier that contradicts the definition you mentioned above.
So in this case it looks like a Folder2 prefix is just getting added ahead of a Folder1 prefix rather than substituting the prefixes.
Do you need a reproducible example to show that this is the case?
From the wording itself, it doesn't specify that the entire object key gets appended to the folder prefix, just that the object is located within the folder prefix. The rest is only left to imagination. I'd expect similar behavior from similar tools. So easy to assume that the behavior that would be expected is the one that comes from aws-cli. Not a brand new one.
Further more, why is documentation referencing file system behavior (I know keys aren't file systems) but not behaving as one. aws-cli seems to manage that. Why cant s3control?
dst_prefix
is in the call to the function
The aws s3 cp
command is unique to the AWS CLI so there may be inconsistencies with other services such as S3 Control. But the S3 Control team owns its APIs which are used across AWS SDKs. So since this relates to the upstream CreateJob API, I'll transfer the issue to our cross-SDK repository and reach out to the S3 Control team for clarification.
Since this issue is in a separate GitHub org I can't transfer it so I'm closing it and will refer you to this issue for updates going forward: https://github.com/aws/aws-sdk/issues/355. If there are any further points you'd like to note for clarification please add a comment there. Thanks!