druid icon indicating copy to clipboard operation
druid copied to clipboard

Track 3rd party libs used in the dist package

Open kgyrtkirk opened this issue 1 year ago • 3 comments

Description

It would be great to at least somehow track the 3rd party deps in a way that they need changes to the PR itself if new ones gets added - which will drag attention toward them and could possibly improve the situation.

Motivation

It seems like there are quite a few versions of the same lib in the distribution build - these might have landed via transitive deps and most likely without being considered.

kgyrtkirk avatar Oct 01 '24 05:10 kgyrtkirk

I'll describe one approach - there might be others:

# do a full dist build like
mvn install -DskipTests  -Pdist -Pbundle-contrib-exts

from there ; we could keep a textfile in the project which supposed to match the list of jars in the dist build. By sorting by filename it could show that the same is present at multiple places - and also it could show that different versions of the same lib are present

tar tzf distribution/target/apache-druid-32.0.0-SNAPSHOT-bin.tar.gz | grep jar$ | sed 's|.*/||'|grep -v '^druid'|sort > distribution/dist_jars.txt

if that list changes; the build should fail

There could also be a check to ensure that libs from lib are get reused via provided

# make a content list
tar tzf distribution/target/apache-druid-32.0.0-SNAPSHOT-bin.tar.gz | grep jar$ | grep -v '/druid' > base.li
# this list should be empty
fgrep -f <(grep /lib/ base.li |sed 's|.*/||') base.li |grep -v '/lib/'

kgyrtkirk avatar Oct 01 '24 05:10 kgyrtkirk

I was checking this and found two problems as of now

  1. There are multiple copies of same version across multiple extensions
  2. There are different versions for same dependencies coming as part of transitive dependencies.

For 1st I found a way to reduce it to max 3 copies which reduced the distribution size from 900M to 600M - https://github.com/apache/druid/pull/17321 I am looking for a way to reduce it to 1 copy

For 2nd I found Maven enforcer rule - https://maven.apache.org/enforcer/enforcer-rules/dependencyConvergence.html We can add dependencies in exclude for which we know multiple versions are required.

shigarg1 avatar Oct 10 '24 07:10 shigarg1

There is some work done in #16973 that might be usable here.

abhishekagarwal87 avatar Oct 10 '24 10:10 abhishekagarwal87