Summary

NIFI-00000

Tracking

Please complete the following tracking steps prior to pull request creation.

Issue Tracking

[ ] Apache NiFi Jira issue created

Pull Request Tracking

[ ] Pull Request title starts with Apache NiFi Jira issue number, such as NIFI-00000
[ ] Pull Request commit message starts with Apache NiFi Jira issue number, as such NIFI-00000

Pull Request Formatting

[ ] Pull Request based on current revision of the main branch
[ ] Pull Request refers to a feature branch with one commit containing changes

Verification

Please indicate the verification steps performed prior to pull request creation.

Build

[ ] Build completed using mvn clean install -P contrib-check
- [ ] JDK 21

Licensing

[ ] New dependencies are compatible with the Apache License 2.0 according to the License Policy
[ ] New dependencies are documented in applicable LICENSE and NOTICE files

Documentation

[ ] Documentation formatting appears as expected in rendered files

Jun 21 '24 01:06 MikeThomsen

@exceptionfactory Alright. I'll open a PR for this functionality on support/nifi-1.x and then start refactoring this PR to use AWS 2.X APIs. Does that work for you?

Jun 21 '24 09:06 MikeThomsen

@exceptionfactory Alright. I'll open a PR for this functionality on support/nifi-1.x and then start refactoring this PR to use AWS 2.X APIs. Does that work for you?

I recommend refactoring first, as the AWS SDK 2 libraries are also available on the support branch. Working through the changes on the main branch first should limit the amount of work necessary to consider back-porting.

Jun 21 '24 12:06 exceptionfactory

Ok. Will do. I will probably have to massively expand the scope of this PR but it is what it is...

Jun 21 '24 13:06 MikeThomsen

@exceptionfactory I did some experimentation with this today, and there is going to be a lot of refactoring work in both the abstract processors module and the aws processors module to get the S3 stuff migrated. I'd rather add these with the V1 API since they're thin wrappers over that API and then do a bigger ticket afterward to try to get everything on the V2 API.

I see your point, but I'd rather get this functionality in now so it's available because we have some use cases that really need it. I don't think it'll add much to that refactoring task.

Jun 23 '24 20:06 MikeThomsen

Thanks for the follow up @MikeThomsen.

Incorporating new Processors and refactoring in the same pull request for S3 doesn't sound ideal, and I agree that the scope of these Processors is fairly narrow in terms of S3 API changes. With that in mind, considering these Processors as they stand seems like a reasonable way forward.

To that end, there are a few things I recommend evaluating. The Copy Processor seems straightforward, and I can take a closer look at potential changes there. I would note that new properties should just use a human-readable name() instead of a different displayName() going forward.

On the Exists Processor, I think it would be better to have it function as a GetS3ObjectMetadata Processor. The doesObjectExist() method requests the metadata, but ignores the response. I think having the metadata would present other potential uses beyond just whether the object exists, while still supporting that particular use case.

Jun 24 '24 20:06 exceptionfactory

strong +1 on David's comments regarding GetS3ObjectMetadata instead of an exists call.

Jun 24 '24 20:06 joewitt

I'll take a look into that @joewitt @exceptionfactory but it might be right to have 3 processors instead. I recall something about how doesObjectExists uses a HEAD which is faster for S3 than trying to fetch the metadata when your only goal is to ask whether the object exists. I'll look into it and adjust accordingly.

Jun 25 '24 16:06 MikeThomsen

I'll take a look into that @joewitt @exceptionfactory but it might be right to have 3 processors instead. I recall something about how doesObjectExists uses a HEAD which is faster for S3 than trying to fetch the metadata when your only goal is to ask whether the object exists. I'll look into it and adjust accordingly.

@MikeThomsen The doesObjectExist() method calls getObjectMetadata(), so there should be no difference in performance. That's the reason I suggested going that direction, thus I would not recommend a separate processor for checking object existence.

Jun 25 '24 16:06 exceptionfactory

@jrsteinebrey @exceptionfactory we should be GTG now on the changes.

Jul 10 '24 14:07 MikeThomsen

@jrsteinebrey @exceptionfactory we should be GTG now on the changes.

Thanks for the updates @MikeThomsen, I will take a closer look soon.

Jul 10 '24 15:07 exceptionfactory

@exceptionfactory any idea when you can get to this?

Jul 16 '24 15:07 MikeThomsen

Thanks @exceptionfactory. I'll try to get a fast turn around on this

Jul 16 '24 20:07 MikeThomsen

@exceptionfactory @jrsteinebrey Changes should be good now. I changed the "attribute" configuration option to be an "attribute prefix" option. I don't think this version of the AWS Java SDK supports retrieving the ACL from the object metadata, but both raw and user-provided metadata is attached at the user-specified prefix.

Aug 08 '24 21:08 MikeThomsen

@exceptionfactory @jrsteinebrey should be good now.

Aug 09 '24 13:08 MikeThomsen

@MikeThomsen Thanks again for these new processors. I am fine with them being merged if the workflow actions succeed (they are still running now) and if @exceptionfactory (or any committer or higher) approves of it being merged.

Aug 09 '24 14:08 jrsteinebrey

@exceptionfactory I played around a bit with that and liked the idea. Should be GTG now.

Aug 09 '24 19:08 MikeThomsen

@exceptionfactory should be good now.

Aug 10 '24 01:08 MikeThomsen

@exceptionfactory do you think you'll have time to look at this today?

Aug 12 '24 15:08 MikeThomsen

Thanks for the updates @MikeThomsen, I should have time to review again within the next few days. At a quick glance, the latest code changes look good, so it looks like it should be close to completion following some runtime testing.

Aug 12 '24 16:08 exceptionfactory

Thanks @exceptionfactory. I'll try to get this completed tonight.

Aug 13 '24 22:08 MikeThomsen

@exceptionfactory done and build started.

Aug 14 '24 18:08 MikeThomsen

nifi
nifi copied to clipboard

NIFI-13430 Added CopyS3Object and ExistsS3Object

Summary

Tracking

Issue Tracking

Pull Request Tracking

Pull Request Formatting

Verification

Build

Licensing

Documentation

nifi nifi copied to clipboard

NIFI-13430 Added CopyS3Object and ExistsS3Object

Summary

Tracking

Issue Tracking

Pull Request Tracking

Pull Request Formatting

Verification

Build

Licensing

Documentation

nifi
nifi copied to clipboard