community.aws icon indicating copy to clipboard operation
community.aws copied to clipboard

s3_lifecycle - add abort_incomplete_multipart_upload and expired_object_delete_marker parameter

Open mdavis-xyz opened this issue 3 years ago • 21 comments

SUMMARY

I would like to apply a bucket lifecycle configuration with ~s3_bucket~ s3_lifecycle.

My specific use case is configuring the bucket to abort abandoned multi-part uploads. (Otherwise you can end up paying for storage of data which is not visible in the bucket.)

ISSUE TYPE
  • Feature Idea
COMPONENT NAME

~s3_bucket~ s3_lifecycle

ADDITIONAL INFORMATION

The bucket policy itself is supposed to be xml.

- amazon.aws.s3_bucket:
    name: mys3bucket
    state: present
    lifecycle_configuration:
        rules:
        - id: sample-rule</ID>
          status: Enabled
          AbortIncompleteMultipartUpload:
              DaysAfterInitiation: 7

I'm unsure whether the module should use the same capitalisation as the API (AbortIncompleteMultipartUpload), or if we should rename to be consistent with Ansible (abort_incomplete_multipart_upload) and then convert it. I suspect using CamelCase would be better.

mdavis-xyz avatar Jan 18 '21 06:01 mdavis-xyz

Files identified in the description:

  • [plugins/modules/s3_bucket.py](https://github.com/['ansible-collections/amazon.aws', 'ansible-collections/community.aws', 'ansible-collections/community.vmware']/blob/main/plugins/modules/s3_bucket.py)

If these files are inaccurate, please update the component name section of the description or use the !component bot command.

click here for bot help

ansibullbot avatar Jan 18 '21 06:01 ansibullbot

cc @jillr @s-hertel @tremble @wimnat click here for bot help

ansibullbot avatar Jan 18 '21 06:01 ansibullbot

Note: ansibullbot seems a bit broken. That list of links isn't rendering as expected. There's possibly some issue with the markdown. How do we get ansibullbot fixed?

mdavis-xyz avatar Jan 18 '21 06:01 mdavis-xyz

Hi @mdavis-xyz,

This is a good idea, but we would suggest to prepare a new dedicated module s3_bucket_lifecycle_configuration. Since it's a new module, it should be created first in community.aws collection.

goneri avatar Jan 20 '21 13:01 goneri

Files identified in the description:

  • [plugins/modules/s3_bucket_notification.py](https://github.com/['ansible-collections/amazon.aws', 'ansible-collections/community.aws', 'ansible-collections/community.vmware']/blob/main/plugins/modules/s3_bucket_notification.py)

If these files are inaccurate, please update the component name section of the description or use the !component bot command.

click here for bot help

ansibullbot avatar Jan 20 '21 13:01 ansibullbot

cc @aljazkosir @miha-plesko @xlab-si click here for bot help

ansibullbot avatar Jan 20 '21 13:01 ansibullbot

Ok, I can try to make a new module.

What's the process long term? Will this remain a new module forever? Or is that just a short-term location to ensure the code is correct, with a view to move it to the amazon.aws collection as a standalone module, or to that collection as part of the bucket module?

mdavis-xyz avatar Jan 20 '21 22:01 mdavis-xyz

@mdavis-xyz We follow the UNIX philosophy - "Make each program do one thing well.". Since we are managing the lifecycle configuration of S3 bucket, I feel this makes sense to have a separate module that is easy to maintain and future-proof.

As a short-term goal, this new module will stay with the "community" collection. Once that matures, we can promote it to the "amazon.aws" collection (which is supported content).

Hope this makes sense.

Akasurde avatar Jan 21 '21 04:01 Akasurde

I'm a bit confused by this request, since there is already the community.aws.s3_lifecycle module doing this. The only thing missing in that module is the AbortIncompleteMultipartUpload and ExpiredObjectDeleteMarker settings. This was already mentioned in this issue https://github.com/ansible/ansible/issues/43434 prior to the migration.

To be able to manage this in the meantime, I've created a couple of tasks that calls the aws-cli. Something I've notice and that I think it's important to mention here: The AbortIncompleteMultipartUpload and ExpiredObjectDeleteMarker settings cannot be manage separately. They can only be one lifecycle configuration for a given prefix and those two settings are part of a lifecyle configuration. My original idea was to create a separate lifecycle for those two settings, but I was not able to for this reason.

Should we adjust this issue title and description to reflect the situation or should I create a new issue that better describe this? Let me know what you think is better.

pgrenaud avatar Oct 20 '21 14:10 pgrenaud

cc @markuman click here for bot help

ansibullbot avatar Oct 20 '21 14:10 ansibullbot

I fully agree @pgrenaud, thx for summarize and clearify the situation.

  • https://docs.ansible.com/ansible/latest/collections/community/aws/s3_lifecycle_module.html
  • https://github.com/ansible-collections/community.aws/blob/main/plugins/modules/s3_lifecycle.py
  • https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.put_bucket_lifecycle

I guess expanding the existing module with two more parameters is not a big task.
Is anyone of you (@pgrenaud @mdavis-xyz ) interesting to provide a PR?

markuman avatar Oct 20 '21 17:10 markuman

How should we structure the interface?

Like this?

    - name: Create a lifecycle policy, with abort_incomplete_multipart_upload (idempotency)
      s3_lifecycle:
        name: '{{ bucket_name }}'
        abort_incomplete_multipart_upload:
          days_after_initiation: 1
        prefix: /something

Is it too verbose to have:

        abort_incomplete_multipart_upload:
          days_after_initiation: 1

(That matches the underlying API)

Or should I make it like:

abort_incomplete_multipart_upload_days: 1

i.e. do I try to make it simplify the API in an opinionated way, which is less verbose, or do I make it match the underlying API as closely as possible, which makes adapting to new features in the API easier?

For expiry, the existing expiration_days and expiration_date args are top level, whereas the API nests them, along with expire object delete marker. So should I promote the object delete market to the top level? What do I call it? expiration_expired_object_delete_marker matches the naming of the existing arguments, but sounds a bit verbose. expire_object_delete_marker sounds more sensible, although that uses the word expire when the underlying API is about expiration and expired. Is that ok?

The API doesn't let you do expired object delete marker, as well as expire days. Should I try to implement that exclusion behavior when validating input arguments? Or just pass it through and rely on AWS to reject the bad input?

Also, when submitting a PR, do I need to update a changelog file somewhere?

mdavis-xyz avatar Nov 08 '21 00:11 mdavis-xyz

How should we structure the interface?

Imo, keeping the API structure makes it easier to maintain the code along to boto3 and to add more features.
Maybe there comes more features under key abort_incomplete_multipart_upload, but maybe not. We don't know :)

expiration_expired_object_delete_marker matches the naming of the existing arguments, but sounds a bit verbose. expire_object_delete_marker sounds more sensible

I agree. Optional you can keep the origin key as an alias parameter.

The API doesn't let you do expired object delete marker, as well as expire days. Should I try to implement that exclusion behavior when validating input arguments? Or just pass it through and rely on AWS to reject the bad input?

If you see this usecase/this errors already. Maybe yes. But it also must be documented.

Also, when submitting a PR, do I need to update a changelog file somewhere?

Yes, when adding new features, you also need to a a changelog/fragments/ file.
See: https://docs.ansible.com/ansible/latest/community/development_process.html#creating-a-changelog-fragment

markuman avatar Nov 08 '21 07:11 markuman

If we want the multipart abort to match the API, should we also change expiration to match the API?

Instead of:

expiration_days: 
expiration_date: 
expire_object_delete_marker: 

It would match more closely if we did:

expiration:
  date:
  days:
  expired_object_delete_marker:

That requires care to make the change backwards compatible. Currently you can only specify one of these 3 anyway. So should I keep the pattern for expire of not matching the API, and add abort in a way that matches the API?

mdavis-xyz avatar Nov 08 '21 22:11 mdavis-xyz

That requires care to make the change backwards compatible.

hm, what did you mean? Atm s3_bucket does not support any expire_ parameters.

And I noticed that the module has moved to amazon.aws collection. So the Issue will be moved to https://github.com/ansible-collections/amazon.aws

markuman avatar Nov 09 '21 06:11 markuman

@markuman The slightly confusing thing here is that s3_lifecycle (which manages the lifecycle configuration of s3 buckets) lives in community.aws, whereas s3_bucket (which creates the buckets) lives in amazon.aws.

In my opinion it's better for us to split out some of this configuration as massive monolithic modules become very difficult to properly test, and as such tend to be less reliable, compared to splitting the functionality out. This becomes even more pronounced when you can apply multiple lifecycle rules based on the folders within a bucket.

At some point Amazon will propably replace boto3 with something incompatible, migrating small modules with well defined scope is massively simpler than a monolith that does "all things s3" (see also iam, ec2, etc).

tremble avatar Nov 09 '21 08:11 tremble

Ah oh boy. Blame me. 🙈
I was confused because the inition post included still s3_bucket. But it's right. as mentioned above, the functionality must be included s3_lifecycle
So the issue musst moved back to community.aws. Sorry :D

markuman avatar Nov 09 '21 08:11 markuman

If we want the multipart abort to match the API, should we also change expiration to match the API?

Instead of:

expiration_days: 
expiration_date: 
expire_object_delete_marker: 

It would match more closely if we did:

expiration:
  date:
  days:
  expired_object_delete_marker:

That requires care to make the change backwards compatible. Currently you can only specify one of these 3 anyway. So should I keep the pattern for expire of not matching the API, and add abort in a way that matches the API?

Imo to not break backwards compatibility, it's better to keep the existing structure imo.

markuman avatar Nov 09 '21 08:11 markuman

There's two options:

  1. Keep the current structure
  2. Deprecate the current structure and use something closer to the API

I would actually lean towards 2 as it feels slightly cleaner, but I would be tempted to do something like

expiration:
  date:
  days:
  delete_marker: (with an alias of expired_object_delete_marker)

tremble avatar Nov 09 '21 08:11 tremble

I've submitted a pull request with two new top-level arguments. I'm not sure how a decision like this is usually made. It sounds like there are two maintainers with differing opinions. Just tell me which approach you want and I'll change my PR if needed.

If you want to change the existing expiration fields, please name another module that uses an alias with a nested parameter, so I can go see how that can be done.

mdavis-xyz avatar Nov 11 '21 00:11 mdavis-xyz

I'm fine with both :)

markuman avatar Nov 11 '21 06:11 markuman