incubator-heron
incubator-heron copied to clipboard
Duplicate files installed bumping up install size by ~190M
The current install is ~816M as seen in the docker images, with ~20% duplicate files. It may be worth cleaning up the install (preferably by reworking the install script) if anyone wants to reduce container sizes:
(
cd /usr/local;
join -j1 -t' ' \
<(find . -type f | xargs md5sum --binary | cut -d' ' -f1 | sort | uniq --repeated) \
<(find . -type f | xargs md5sum --binary | sort) \
| awk 'last != $1 {
printf ": "
system( "echo " size * (count - 1) " | numfmt --to=iec" );
cmd="stat -c %s " substr($2, 2);
cmd | getline size;
last=$1;
count=0;
}
{
count++;
printf "%s ", substr($2, 2);
}
END {
printf "total duplicated size: "
system( "echo " size * (count - 1) " | numfmt --to=iec" );
}
'
) | sort -t: -h -k2
Gives a total of 190M duplicated:
: 0
./heron/conf/examples/roundrobin_packing.yaml ./heron/conf/mesos/client.yaml : 800
./heron/conf/local/client.yaml ./heron/conf/sandbox/client.yaml : 984
./heron/conf/aurora/client.yaml ./heron/conf/marathon/client.yaml : 1.1K
./heron/conf/kubernetes/uploader.yaml ./heron/conf/nomad/uploader.yaml : 1.2K
./heron/conf/slurm/statemgr.yaml ./heron/conf/yarn/statemgr.yaml : 1.2K
./heron/include/spout/irich-spout.h ./heron/include/topology/irich-spout.h : 1.3K
./heron/include/spout/base-rich-spout.h ./heron/include/topology/base-rich-spout.h : 1.5K
./heron/bin/heron-apiserver ./heron/bin/heron-apiserver.sh : 1.6K
./heron/conf/kubernetes/stateful.yaml ./heron/conf/nomad/stateful.yaml : 1.6K
./heron/conf/localzk/client.yaml ./heron/conf/slurm/client.yaml ./heron/conf/yarn/client.yaml : 1.8K
./heron/include/spout/ispout-output-collector.h ./heron/include/topology/ispout-output-collector.h : 2.5K
./heron/conf/local/scheduler.yaml ./heron/conf/localzk/scheduler.yaml ./heron/conf/sandbox/scheduler.yaml : 2.7K
./heron/bin/heron-downloader-config ./heron/bin/heron-downloader-config.sh ./heron/dist/heron-core/bin/heron-downloader-config ./heron/dist/heron-core/bin/heron-downloader-config.sh : 2.8K
./heron/conf/local/stateful.yaml ./heron/conf/localzk/stateful.yaml ./heron/conf/sandbox/stateful.yaml : 3.3K
./heron/bin/heron-downloader ./heron/bin/heron-downloader.sh ./heron/dist/heron-core/bin/heron-downloader ./heron/dist/heron-core/bin/heron-downloader.sh : 3.6K
./heron/conf/local/statemgr.yaml ./heron/conf/marathon/statemgr.yaml ./heron/conf/mesos/statemgr.yaml ./heron/conf/sandbox/statemgr.yaml : 4.0K
./heron/conf/local/healthmgr.yaml ./heron/conf/sandbox/healthmgr.yaml ./heron/conf/yarn/healthmgr.yaml : 4.4K
./heron/conf/local/uploader.yaml ./heron/conf/localzk/uploader.yaml ./heron/conf/mesos/uploader.yaml ./heron/conf/sandbox/uploader.yaml ./heron/conf/slurm/uploader.yaml : 4.7K
./heron/conf/marathon/stateful.yaml ./heron/conf/mesos/stateful.yaml ./heron/conf/slurm/stateful.yaml ./heron/conf/yarn/stateful.yaml : 4.9K
./heron/include/spout/ispout.h ./heron/include/topology/ispout.h : 5.6K
./heron/conf/aurora/downloader.yaml ./heron/conf/examples/downloader.yaml ./heron/conf/kubernetes/downloader.yaml ./heron/conf/local/downloader.yaml ./heron/conf/localzk/downloader.yaml ./heron/conf/nomad/downloader.yaml ./heron/conf/sandbox/downloader.yaml ./heron/conf/yarn/downloader.yaml total duplicated size: 7.7K
./heron/conf/aurora/packing.yaml ./heron/conf/kubernetes/packing.yaml ./heron/conf/local/packing.yaml ./heron/conf/localzk/packing.yaml ./heron/conf/marathon/packing.yaml ./heron/conf/mesos/packing.yaml ./heron/conf/nomad/packing.yaml ./heron/conf/sandbox/packing.yaml ./heron/conf/slurm/packing.yaml ./heron/conf/yarn/packing.yaml : 9.1K
./heron/conf/local/heron_internals.yaml ./heron/conf/sandbox/heron_internals.yaml : 13K
./heron/conf/aurora/heron_internals.yaml ./heron/conf/examples/heron_internals.yaml ./heron/conf/localzk/heron_internals.yaml ./heron/conf/marathon/heron_internals.yaml ./heron/conf/mesos/heron_internals.yaml ./heron/conf/slurm/heron_internals.yaml ./heron/conf/yarn/heron_internals.yaml : 74K
./heron/conf/aurora/metrics_sinks.yaml ./heron/conf/local/metrics_sinks.yaml ./heron/conf/localzk/metrics_sinks.yaml ./heron/conf/marathon/metrics_sinks.yaml ./heron/conf/mesos/metrics_sinks.yaml ./heron/conf/sandbox/metrics_sinks.yaml ./heron/conf/slurm/metrics_sinks.yaml ./heron/conf/yarn/metrics_sinks.yaml : 76K
./heron/dist/heron-core/lib/statemgr/heron-localfs-statemgr.jar ./heron/lib/statemgr/heron-localfs-statemgr.jar : 5.7M
./heron/dist/heron-core/lib/scheduler/heron-scheduler.jar ./heron/lib/scheduler/heron-scheduler.jar : 7.8M
./heron/dist/heron-core/lib/scheduler/heron-local-scheduler.jar ./heron/lib/scheduler/heron-local-scheduler.jar : 8.0M
./heron/dist/heron-core/lib/scheduler/heron-marathon-scheduler.jar ./heron/lib/scheduler/heron-marathon-scheduler.jar : 8.0M
./heron/dist/heron-core/lib/scheduler/heron-slurm-scheduler.jar ./heron/lib/scheduler/heron-slurm-scheduler.jar : 8.0M
./heron/dist/heron-core/lib/statemgr/heron-zookeeper-statemgr.jar ./heron/lib/statemgr/heron-zookeeper-statemgr.jar : 11M
./heron/dist/heron-core/lib/packing/heron-binpacking-packing.jar ./heron/lib/packing/heron-binpacking-packing.jar ./heron/lib/scheduler/heron-binpacking-packing.jar : 12M
./heron/dist/heron-core/lib/packing/heron-roundrobin-packing.jar ./heron/lib/packing/heron-roundrobin-packing.jar ./heron/lib/scheduler/heron-roundrobin-packing.jar : 12M
./heron/dist/heron-core/lib/metricscachemgr/heron-metricscachemgr.jar ./heron/lib/metricscachemgr/heron-metricscachemgr.jar : 13M
./heron/dist/heron-core/lib/scheduler/heron-mesos-scheduler.jar ./heron/lib/scheduler/heron-mesos-scheduler.jar : 14M
./heron/dist/heron-core/lib/scheduler/heron-nomad-scheduler.jar ./heron/lib/scheduler/heron-nomad-scheduler.jar : 15M
./heron/dist/heron-core/lib/scheduler/heron-kubernetes-scheduler.jar ./heron/lib/scheduler/heron-kubernetes-scheduler.jar : 36M
./heron/dist/heron-core/lib/downloaders/heron-downloader.jar ./heron/lib/downloaders/heron-downloader.jar : 45M
most of which (182.9M) is between /usr/local/heron/lib/
and /usr/local/heron/dist/heron-core/lib/
$ (
for dir in /usr/local/heron/lib/ /usr/local/heron/dist/heron-core/lib/; do
(cd "$dir" && find . -type f -exec sh -c 'ls -sh {} | tail -n1' \; )
done
) | sort | uniq --count --repeated
2 11M ./statemgr/heron-zookeeper-statemgr.jar
2 13M ./metricscachemgr/heron-metricscachemgr.jar
2 14M ./scheduler/heron-mesos-scheduler.jar
2 15M ./scheduler/heron-nomad-scheduler.jar
2 36M ./scheduler/heron-kubernetes-scheduler.jar
2 45M ./downloaders/heron-downloader.jar
2 5.7M ./packing/heron-binpacking-packing.jar
2 5.7M ./packing/heron-roundrobin-packing.jar
2 5.7M ./statemgr/heron-localfs-statemgr.jar
2 7.8M ./scheduler/heron-scheduler.jar
2 8.0M ./scheduler/heron-local-scheduler.jar
2 8.0M ./scheduler/heron-marathon-scheduler.jar
2 8.0M ./scheduler/heron-slurm-scheduler.jar
Is the script accounting for symlinks? I believe some of the contents in /heron
are symlinks to another location.
Is the script accounting for symlinks? I believe some of the contents in
/heron
are symlinks to another location.
Good question, but not in these cases. find
doesn't follow symlinks by default and the first script has both find
s starting from the same place so that won't find duplicates. The second starts at different places, but there are no symlinks in the path of /usr/local/heron/lib
and /usr/local/heron/dist/heron-core/lib
.
Here's all the symlinks that exist in case it helps a clean up:
# made during heron-install.sh
ln -s /usr/local/heron/bin/heron{,-apiserver,-explorer,-tracker,-ui} /usr/local/bin/
# made in the Dockerfile
ln -s /usr/local/heron/dist/heron-core /heron/
ln -s /usr/local/heron/{examples,release.yaml} /heron/
ln -s /usr/local/heron/{bin,conf,dist,lib,release.yaml /heron/heron-tools/