pulsar icon indicating copy to clipboard operation
pulsar copied to clipboard

Use "slimmed" `fastutil` for original clients

Open jdimeo opened this issue 1 month ago • 3 comments

Search before reporting

  • [x] I searched in the issues and found nothing similar.

Motivation

Pulsar uses fastutil for high performance collections. This .jar is around 25 MB because of specialization, handling all the permutations of primitive combinations for the data structures. Pulsar client only uses fastutil in one place- tracking nacks.

When using the packaged/shaded client, only the referenced classes are included. However, we prefer to use the -original client so we can manage transitive dependency versions and not include them redundantly when they were already on our classpath. This helps control the overall packaged/shaded size of our deployments.

However, fastutil, un-minimized, blows up our packaging size and exceeds the unzipped size of AWS Lambda, etc. Someone has packaged up subsets of fastutil into "bite sized pieces" and uploaded them to Maven: https://mvnrepository.com/artifact/com.nukkitx.fastutil/fastutil-long-object-maps

We are now using Maven to exclude fastutil and include this library instead. Can Pulsar consider making -original projects dependent on this "only what's needed" fastutil variant instead of the original "kitchen sink" one?

Solution

Don't depend on the full fastutil from -original modules in Pulsar. Use https://mvnrepository.com/artifact/com.nukkitx.fastutil/fastutil-long-object-maps instead.

Alternatives

Refactor NegativeAcksTracker to not require fastutil. I understand the need on the server side/in the broker, but does client code require a 25 MB dependency just in this one location?

Anything else?

No response

Are you willing to submit a PR?

  • [ ] I'm willing to submit a PR!

jdimeo avatar Nov 17 '25 15:11 jdimeo

For software supply chain security reasons, I wouldn't like to depend on projects that aren't well known.

In Pulsar, there's already https://github.com/apache/pulsar/blob/master/pulsar-client-dependencies-minimized/pom.xml which handles minimizing fastutil classes to the ones that are used. We don't currently publish this artifact to maven central since it wasn't designed for external use.

If you are using a maven build, you could copy https://github.com/apache/pulsar/blob/master/pulsar-client-dependencies-minimized/pom.xml to your project to build the minimized version of fastutil for pulsar-client-original. You could then exclude the fastutil dependency from pulsar-client-original in your usage and add the minimized version of fastutil as a dependency instead. Is this a feasible solution for your use case?

lhotari avatar Nov 17 '25 17:11 lhotari

Don't depend on the full fastutil from -original modules in Pulsar. Use https://mvnrepository.com/artifact/com.nukkitx.fastutil/fastutil-long-object-maps instead.

This dependency isn't available in MavenCentral: https://central.sonatype.com/search?q=%20com.nukkitx.fastutil . It's in another repository. Maven doesn't have good control of how repositories are used to resolve dependencies and that's why adding a new repository could cause additional security and build performance and reliability issues.

lhotari avatar Nov 17 '25 17:11 lhotari

Thanks for the fast resposne @lhotari - I can totally understand and agree with you not wanting to use that dependency. I really meant that more as an example of what Pulsar could do. Thanks for the tip on your minimized POM- I will copy that methodology in the short term but I guess my main request still stands: can -original dependencies send a minimized fastutil transitively? If that means Pulsar adapting their Maven build to produce the minimized dependency and exposing that publicly and that's the best route, that's great too!

jdimeo avatar Nov 17 '25 17:11 jdimeo