Add slim and primitive artifacts
I have been experimenting with EC and concerned about the size of the jar. My use cases currently only require the object-based classes. So I started wondering and chatting with @donraab about what might be required to split the primitive specializations out into a separate jar.
This PR is mainly intended to start a conversation about that possibility.
The first attempt here essentially just adds two additional jars to the build:
- a "slim" jar that does not include any of the primitive packages
- a "package" jar that includes only those packages
The slim jar is about 75% smaller than the full release jar, and the very few collections I have tested appear to still work. As long as you don't stumble into the primitive APIs, the JVM doesn't care that the primitive implementations aren't there.
More generally, though, I would like to discuss a realistic path toward a smaller or more modular release. I can't be the only one that wants just a few classes from this library but can't afford to add a large jar to my project.
Thanks for reviving the discussion @headius.
I took a look around at FastList, UnifiedSet, UnifiedMap, ArrayStack, and HashBag. It looks like HashBag is the only one in this set that would need to have direct dependencies to a ObjectIntHashMap replaced with factory calls. This current split would mean though that HashBag would be inoperable in the "slim" jar without having the "package" jar on the classpath.
It looks like a lot of the necessary cleanup of direct primitive implementation dependencies has been completed over the years. This is nice to see.
If it is possible to create these two additional targets and publish the next release with BOTH an uber jar (eclipse-collections.jar) and separate object ("slim") and primitive ("package") jars without any other issues, I find it hard to think of reasons why we shouldn't do this, other than it requires work. The split between object and primitive jars so far looks like it would be mostly clean.
Pinging @motlin for his thoughts on this.
I have been experimenting with EC and concerned about the size of the jar. My use cases currently only require the object-based classes. So I started wondering and chatting with @donraab about what might be required to split the primitive specializations out into a separate jar.
This PR is mainly intended to start a conversation about that possibility.
The first attempt here essentially just adds two additional jars to the build:
- a "slim" jar that does not include any of the primitive packages
- a "package" jar that includes only those packages
The slim jar is about 75% smaller than the full release jar, and the very few collections I have tested appear to still work. As long as you don't stumble into the primitive APIs, the JVM doesn't care that the primitive implementations aren't there.
More generally, though, I would like to discuss a realistic path toward a smaller or more modular release. I can't be the only one that wants just a few classes from this library but can't afford to add a large jar to my project.
The separation of the api and impl jars in theory allows the replacement of the impl jar with another flavor. I have wanted to experiment with a null-forbidding jar where every collection throws on add(null) or put(null). I'm sure it's possible to create a slim jar without primitive collections. One downside is that this moves complexity out of code and into the build, in a way that could make errors more confusing for anyone who didn't set up the build.
Even though I only use a small fraction of the impl jar in most of my projects, a large jar has never really bothered me.
The jar is 10 MB and pulls in no other dependencies. That's peanuts in all but the most constrained environments (android, right?). For constrained environments, I would recommend using a jar minifier instead of changing the distribution -- that works for someone who wants to just use one class, three classes or all of them.
The real issue with splitting the jar is the runtime dependency between the object and primitive sides. Maven style dependency declaration is not capable of correctly expressing that, leading to a situation where our declared dependency for the jar is essentially wrong.
Sorry for disappearing after dropping this on y'all. Replies below.
One downside is that this moves complexity out of code and into the build
As long as the standard artifacts were still published, nobody using EC would ever need to care. My goal is to provide a standard artifact in Maven that contains only the Object parts of the API, for use cases that don't care or don't need the extra size and overhead of the primitive versions. I have such a case.
The jar is 10 MB and pulls in no other dependencies
That's a rather large blanket statement given the wide world of Java applications and libraries. For a 100MB application, perhaps adding 10MB isn't a big deal. For a 1MB library that wants to use some of EC's collections, 10MB means suddenly all upstream users have 11x more jar size to deal with. A smaller EC library means more libraries can choose to use EC, and that transitive dependency doesn't inflict the size on downstream users.
I would recommend using a jar minifier
Jar minifiers are incompatible with a whole bunch of modern Java requirements, perhaps the largest of which is the Java module system. Forcing users to do minification when the library itself could ship an additional slimmed-down version makes more work for everyone. Plus...
instead of changing the distribution
This does not change the distribution. The full-size "fat" jar continues to be pushed as the primary artifact for EC. All this does is add an alternative structure: the "slim" and "primitive" versions of the artifact that contain only those pieces.
Again, existing users of EC would not even notice the change, but it would open up more use cases to more users.
The real issue with splitting the jar is the runtime dependency between the object and primitive sides
There's more improvement possible here, to make the slim side truly independent of the primitive side, but this is a very simple first pass at slimming down the library for use cases that need it.
Don't get me wrong, I understand how much easier it is to have a "fat" jar with batteries included. JRuby used to only ship a fat jar that shaded all dependencies, because we believed that was the use case people really wanted. It turned out that most users actually wanted (or needed) to manage those dependencies themselves (and this is to say nothing for the complexities of supporting JPMS with a shaded jar).
Maven style dependency declaration is not capable of correctly expressing that
This is not a typical situation for Maven dependency declaration. This is an alternative artifact that users can choose when a smaller stripped-down version is needed.
My own use case is only the "slim" version, since the places I intend to use EC will only deal with objects. I added the "primitive" artifact for completeness, and so that someone using the "slim" artifact could still opt into the "primitive" artifact without the conflicts that would arise from linking both "slim" and "fat" and having duplicate classes.
Again, the only change here is to push what's already being pushed in the "fat" jar as two opt-in component jars "slim" and "primitive", which together make up the whole of EC without inflicting its size on people that only need Object collections.