ion-java icon indicating copy to clipboard operation
ion-java copied to clipboard

Adds a StreamInterceptor interface to allow users to plug in custom interceptors for formats like Zstd.

Open tgregg opened this issue 1 year ago • 2 comments

Description of changes:

This library has always had auto-detection of GZIP streams built in, meaning that when users attempt to construct an IonReader from an InputStream or byte[], the given input is checked for the GZIP format header and wrapped in a GZIPInputStream if the header is present.

Now that many users are replacing GZIP with other compression formats like Zstd, we have to decide how to make this library as user-friendly as possible for users making that transition while limiting the amount of special-case code and dependencies that we add to the library.

This PR sketches out one possibility: to define an interface (provisionally named StreamInterceptor) that can be implemented either by users directly, or by external libraries that we vend, to plug in support for any desired format. The PR demonstrates how this mechanism is used by replacing the existing GZIP detection support with support that is delivered via a new GZIPStreamInterceptor implementation.

The ZstdStreamInterceptorTest demonstrates how a StreamInterceptor that recognizes Zstd streams can be plugged into the IonReaderBuilder and IonSystem. In summary, users that wish to support Zstd would change existing code that looks like

IonReaderBuilder readerBuilder = IonReaderBuilder.standard();

to

IonReaderBuilder readerBuilder = IonReaderBuilder.standard().addStreamInterceptor(ZstdStreamInterceptor.INSTANCE);

and code that looks like

IonSystem ION_SYSTEM = IonSystemBuilder.standard().build();

to

IonSystem ION_SYSTEM = IonSystemBuilder.standard()
    .withReaderBuilder(IonReaderBuilder.standard().addStreamInterceptor(ZstdStreamInterceptor.INSTANCE))
    .build();

Critically, this does not require code changes in every location that an IonReader is constructed, and works with all methods of constructing readers (e.g. IonSystem.newReader variants, IonSystem.singleValue, IonSystem.iterate, IonLoader.load, IonReaderBuilder.build variants, etc.).

Comments on the approach are welcomed.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

tgregg avatar Aug 30 '24 21:08 tgregg

I think a bit of working backwards would help inform your design. Thinking out loud, for a customer migrating to new formats, the desired experience would be, in order of preference:

  1. No-op.
  2. Configuration change.
  3. Trivial code change.
  4. Non-trivial code change.

#1 is out due to the desire to limit this library's dependencies and scope.

An example of #2 would be someone else building a new library that wraps ion-java and injects new functionality in a completely transparent way. This could be done with intercepting proxies using an AOP library or java.lang.reflect.Proxy. Customer would add the new library as a dependency without having to make any code changes. While that's cool in theory, ion-java has multiple places where IonReader is constructed, as you mention, so this might be challenging to implement.

#3 is potentially a good tradeoff: if customers are willing to make a small configuration change, they might as well agree to a trivial code change. I think your proposal works well to enable this approach: they add the new library that provides custom ion-java interceptors and they make a low-risk code change where their reader is constructed.

The other aspect of migration is being able to handle both the old gzip format and the new zstd/whatever format. Your proposed design addresses that by supporting multiple interceptors and using the first one that claims a header match. One can think of use cases where chaining more than one interceptor would be beneficial, but I don't know if that's stepping into overengineering territory. Just make sure to avoid a one-way door and leave the opening for future extension.

Another possibly-overengineering point is that not all format detection logic fits into the "fixed header" mold. Sometimes headers are not fixed length, and sometimes they are not headers at all. This works 99% of the time though, so the same comment about avoiding one-way doors.

artemkach avatar Sep 04 '24 18:09 artemkach

Please consider automatically discovering interceptors via the services API, to make integration as easy as dropping them on the classpath. It's going to be annoying if one needs to configure these all over, when I expect most of the time a customer will want to enable something for the entire application or application suite.

[Update] Per @artemkach comment, this is effectively a 1.5 classpath-only change, and much simpler for everyone than AOP or proxy injection.

toddjonker avatar Sep 11 '24 19:09 toddjonker

Please consider automatically discovering interceptors via the services API, to make integration as easy as dropping them on the classpath. It's going to be annoying if one needs to configure these all over, when I expect most of the time a customer will want to enable something for the entire application or application suite.

That sounds like a good experience. I will look into it.

tgregg avatar Sep 26 '24 18:09 tgregg

Revision 2:

  • Incorporates feedback from @toddjonker about naming, documentation, and class locations.
  • Allows users to add InputStreamInterceptor implementations by registering service providers on the classpath (as suggested by @toddjonker and @artemkach's suggestion to consider making this available via configuration change), as an alternative to adding them manually using IonReaderBuilder.addInputStreamInterceptor. In addition to the added unit tests, I tried this out in a separate project and was able to register a stream interceptor by simply adding a file named com.amazon.ion.util.InputStreamInterceptor, containing the fully qualified class name of my custom implementation, to the project's META-INF/services/. More information about service providers can be found at https://docs.oracle.com/javase/9/docs/api/java/util/ServiceLoader.html

tgregg avatar Dec 05 '24 21:12 tgregg