spring-cloud-aws Read bucket region from redirect header

:loudspeaker: Type of change

[ ] Bugfix
[ ] New feature
[x] Enhancement
[ ] Refactoring

:scroll: Description

Read the bucket region from the PermanentRedirect error message headers and cache a client for that region.

Unfortunately this was not doable with the LruCache, since it does not permit manual insertion of entries. If there is a better way i am happy to alter the impl.

:bulb: Motivation and Context

Fixes #452

:green_heart: How did you test it?

Running with local project using bucket in region without owner access.

:pencil: Checklist

[x] I reviewed submitted code
[x] I added tests to verify changes
[ ] I updated reference documentation to reflect the change
[x] All tests passing
[x] No breaking changes

:crystal_ball: Next steps

Jul 11 '22 20:07 fjmacagno

I also noticed that i seem to have edited the generated version of the class instead of the template. When #456 is merged ill redo these changes on top of the template instead.

Jul 18 '22 19:07 fjmacagno

@maciejwalkowiak Unfortunately the ConcurrentLRUMap class is needed both by the spring-cloud-aws-s3-codegen and spring-cloud-aws-s3-cross-region-client projects in order to compile. How would you like to fix the project structure so this will work?

Jul 22 '22 18:07 fjmacagno

@fjmacagno can you bring back fallback when header is not present?

Aug 03 '22 06:08 maciejwalkowiak

@maciejwalkowiak Have you considered getting rid of the autogen functionality? It is making what i am trying to do harder (since some client operations behave differently than others), and that combined with having to inline ConcurrentLruMap seems to suggest that it has outlived its usefulness.

Aug 09 '22 18:08 fjmacagno

Client generation was meant to make things easier - and it did for me when creating initial cross client implementation. What is the problem you're facing?

Aug 09 '22 22:08 maciejwalkowiak

Well, since we know that the header will be present for certain client operations, we could make the assumption that HeadBucket and ListObject operations will always have the header, and throw an error otherwise. Then, if the operation being done is anything else, and the 301 error occurs and the header isnt present, we could just call HeadBucket on the bucket in use. If it returns successfully, we know that it was able to retrieve the region and cache the client, and we can continue. This way, we can still avoid explicitly doing the annoying GetBucketLocation operation which only bucket owners can do.

The complexity here is that in order to avoid infinite recursion, those special operations need to not call themselves on an error. The methods i see currently are:

add a flag to executeInBucketRegion controlling the recursive call, defaulting to true and being false when called recursively
duplicate the error-code checking and header-retrieval logic into another method that just handles the HeadBucket part

Both of these solutions though could result in a duplicate head or list operation being called, since we cant tell what operation the passed function is. If the client code wasnt generated, we could have the head and list operations call a simplified version of executeInBucketRegion which does not have the recursive logic, or even just pass a false flag from option 1, and so surface an exception faster.

A compromise option might be to have the code generation instead create an interface or abstract class with default methods for each of the S3Client methods, which would call the abstract method executeInBucketRegion. Then, the actual implementation could have overrides of specific S3Client methods (the HeadBucket and ListObjects operations). This would also eliminate one of the mvn projects, and in principle the code generation could be an earlier maven step and not have to have the result committed, especially since the generated code would never change.

This is definitely not at all necessary, but i noticed what i was writing didn't feel right and i thought i would bring it up. i definitely may also be missing something obvious. I'll try to get one of simple options written as a reference.

~ Edited to correct HeadObject to HeadBucket

Aug 09 '22 22:08 fjmacagno

A compromise option might be to have the code generation instead create an interface or abstract class with default methods for each of the S3Client methods, which would call the abstract method executeInBucketRegion. Then, the actual implementation could have overrides of specific S3Client methods (the HeadBucket and ListObjects operations). This would also eliminate one of the mvn projects, and in principle the code generation could be an earlier maven step and not have to have the result committed, especially since the generated code would never change.

This sounds like a good way to go. Feel free to go this path and when I am back from holidays I'll dig deeper.

Thanks a lot for for analysis!

Aug 10 '22 10:08 maciejwalkowiak

@fjmacagno great job thanks a lot! I did tiny polishing with reducing the surface area of public APIs.

One thing left that I think we should do but want to consult with you before - wouldn't renaming CrossRegionS3Client to AbstractCrossRegionS3Client and DefaultCrossRegionS3Client to CrossRegionS3Client make more sense? From the user point of view I think having DefaultCrossRegionS3Client may suggest like there are possibilities for other implementations and I don't think it is true in the scope of Spring Cloud AWS at least.

Aug 11 '22 17:08 maciejwalkowiak

Thanks!

I just always leave the possibility open out of habit, especially when working with Spring, just to leave things flexible; i would argue a user should just be concerned with having an instance of the abstract class, and the library can do the details behind the scenes. I'm happy to do this however you would prefer though.

edit: an option could be to rename DefaultCrossRegionS3Client to something more evocative of this specific implementation, maybe ExceptionalCrossRegionS3Client

Aug 11 '22 17:08 fjmacagno

I made one more set of simplification changes. I could also fully get rid of the CrossRegionS3ClientTemplate if desired and just fully generate the class, but i dont know if that is a net-positive.

This is the cleanest i think i can make it without changing project structure.

Aug 11 '22 19:08 fjmacagno

especially when working with Spring, just to leave things flexible

That's true, Spring is very flexible, but we cannot afford such flexibility. For us hiding more means less effort to maintain it later (which is quite important for a side project).