cartography icon indicating copy to clipboard operation
cartography copied to clipboard

feat: Add s3_object resource to the aws module

Open AdityaPandeyCN opened this issue 7 months ago • 10 comments

Summary

This PR adds s3_objects support to the aws module. I will be making it more complete like using all the model classes I have contructed, I also feel I can make the file naming and structure better so will be also working on that.

Related issues or links

Include links to relevant issues or other pages.

  • https://github.com/cartography-cncf/cartography/issues/1552

Checklist

Provide proof that this works (this makes reviews move faster). Please perform one or more of the following:

  • [x] Update/add unit or integration tests.
  • [ ] Include a screenshot showing what the graph looked like before and after your changes.
  • [ ] Include console log trace showing what happened before and after your changes.

If you are changing a node or relationship:

If you are implementing a new intel module:

AdityaPandeyCN avatar May 23 '25 12:05 AdityaPandeyCN

Ooh this one is super interesting!

@kunaals @jychp @heryxpc

achantavy avatar May 23 '25 13:05 achantavy

@achantavy interesting but can lead to very very long runs :)

@AdityaPandeyCN you did not choose the easiest one.

Adding the DEFAULT_MAX_OBJECTS_PER_BUCKET = 10000 is a good security, but I think we want to be able to configure that using the CLI.

You will have to work on following files:

  • cli.py
  • config.py

I think we want:

  • a default value
  • 0 will disable object sync

Disabling object sync could be very important in a restricted environment (if their is any risk that object name contains PII/ePHI etc ...)

(I have a doubt with 0, setting a limit to 0 is usually for unlimited, but want to try to avoid adding a --enable-object-sync)

(@achantavy feel free to comment/object those requirements)

jychp avatar May 23 '25 13:05 jychp

Yeah I was gonna say, this is one where we want to make it very configurable and include documentation on how to turn it off because of how much data it can process and how much time it can add to a sync

achantavy avatar May 23 '25 13:05 achantavy

But it's crazy useful

achantavy avatar May 23 '25 13:05 achantavy

Yeah, that will allow so many new analyses use cases

jychp avatar May 23 '25 13:05 jychp

Thankyou for the comments, I am thinking of adding a s3-object-limit flag to control S3 object syncing. It'll default to 10000 objects per bucket, but users can set their own limit or disable it completely (with 0) for sensitive data environments. I'll start with the CLI and config changes. What do you think?

AdityaPandeyCN avatar May 23 '25 15:05 AdityaPandeyCN

I feel this is ready for review.

AdityaPandeyCN avatar May 26 '25 04:05 AdityaPandeyCN

Hi @AdityaPandeyCN can your handle the merge conflict ? Sorry for the delay in PR review, was very busy recently.

jychp avatar Jun 15 '25 08:06 jychp

@AdityaPandeyCN let me actually have a try at this, I'll push to this branch

achantavy avatar Jun 17 '25 23:06 achantavy

Thankyou @achantavy I have checked edits from maintainers box, Do I have to do anything more?

AdityaPandeyCN avatar Jun 18 '25 02:06 AdityaPandeyCN

I have addressed your suggestions.

AdityaPandeyCN avatar Jun 27 '25 06:06 AdityaPandeyCN