graal icon indicating copy to clipboard operation
graal copied to clipboard

Reduce size of output executable

Open ianopolous opened this issue 7 years ago • 30 comments

Compiling hello world with substrate vm on ubuntu results in a 6.1 MiB executable. Is it possible to reduce this? The equivalent in golang is 1.6 MiB or < 1 MiB without debug information.

ianopolous avatar Jan 20 '18 23:01 ianopolous

True, I have evaluated the image size and even for an empty main program we get ~5MB of an image. There are a few reasons for that:

  1. In our features we use JDK code that has non-negligable footprint. To see all kinds of things that get pulled in you can add -H:+PrintUniverse to the image build.
  2. Some of our features are included into the image although they are never used in the code.
  3. The points-to analysis is imprecise and sometimes catches elements that are never used.

On the bright side, if you include much of your code the 5MB overhead will remain the same. So this is an issue only for very small images. This is a great issue. If you have a need for small images in your use-case, please mention it here and we will raise the priority.

vjovanov avatar Jan 30 '18 14:01 vjovanov

unused-pkgs-hw.txt unused-classes-hw.txt unused-methods-hw.txt

These are the packages, classes, and methods that are never invoked. They can use as an indicator for elements that should not be in the image. Some things like the heap package must be included into the image, although for this particular program they are never used.

@pejovica thanks for the data.

vjovanov avatar Jan 30 '18 15:01 vjovanov

The general use-case is to remove a common argument for people to use Go-lang. One specific use-case that this would severely impact is something like implementing many small command line utilities as in Linux.

Does that 5 MiB include the GC? At least for simple things like helloworld you can prove you don't need a GC.

ianopolous avatar Jan 30 '18 15:01 ianopolous

It does, but by looking at the list of included elements, I would not say that GC is the biggest problem. I would rather invest that time to remove things that should not be there by any means. For example, org.graalvm.compiler.truffle, java.util.zip, java.util.regex, java.util.Calendar.

By removing these I am confident that we can reach the size of the GOs "Hello, Word!". At one point we removed all methods that were never executed and the image size was 400 KB. This is the lower bound of course, but could be used as a guideline of what we should reach.

vjovanov avatar Jan 30 '18 15:01 vjovanov

Thanks @vjovanov, that would be amazing!

ianopolous avatar Jan 30 '18 17:01 ianopolous

You can also use https://upx.github.io/ as a temporary solution to make compressed binaries. Reduces the size by a lot in my experience.

CremboC avatar Apr 30 '18 19:04 CremboC

Any thoughts on this, guys?

I'm targeting Graalvm as the (probably/hopefully) the solution for the long cold-starts in AWS Lambda functions written in Java. Smaller binaries would make our deployments faster. Also, AWS has some limits on deployment size, I'm afraid that binaries would become too big if we have multiple dependencies in our project - which is usually the case when using AWS SDK.

I think that's a game changer functionality that would make JVM more attractive to the community, especially those who have been flirting with Go and Rust as an alternative.

miere avatar Apr 11 '19 00:04 miere

(not issue relevant) @miere , already discovered https://quad.team/blog/Micronaut-to-AWS-Lamda-guide ?

SchulteDev avatar Apr 11 '19 07:04 SchulteDev

Any thoughts on this, guys?

I'm targeting Graalvm as the (probably/hopefully) the solution for the long cold-starts in AWS Lambda functions written in Java. Smaller binaries would make our deployments faster. Also, AWS has some limits on deployment size, I'm afraid that binaries would become too big if we have multiple dependencies in our project - which is usually the case when using AWS SDK.

I think that's a game changer functionality that would make JVM more attractive to the community, especially those who have been flirting with Go and Rust as an alternative.

Now that we have GraalVM building against JDK 11, it's only a matter of time until the native compiler can work with the new modularity. I doubt file sizes will ever be improved on JDK 8 though since the class library was very.... let's say "monolithic" before the Project Jigsaw refactor.

So until those native compiler improvements, I suggest updating JDK 8 projects to JDK 11 and making them modular in preparation for that :D

Also, see what CremboC said - UPX is pretty good. ~11MB exe down to ~3MB.

cosmicdan avatar Nov 22 '19 18:11 cosmicdan

@thomaswue @vjovanov I would like to share an approach I took five years ago (2015) and made custom jvms which were extremely small (A JavaFX UI application with runtime totally to only 5MB (after zipping)).

I used the following to achieve this result

  • javafx native packing tool
  • spyfs
  • Xbootclasspath flag

Steps

  1. So what I did was, I extracted all runtime/bootstrap classes/jars in a single folder. Not just rt.jar, anything which is used. This was my custom bootstrap classes folder.
  2. I packaged my application using javafx native packing tool
  3. I replaced some setting in this, using Xbootclasspath flag so that it picked up classes from the custom bootstrap classes folder.
  4. I made a virtual clone of this using spyfs.
  5. I ran the application on this virtual clone.
  6. SpyFS detected which classes were actually loaded and saved this information.
  7. SpyFS copied only the classes which were actually loaded into a third folder - application output folder.
  8. The logic which I used was - Case 1: if a classfile was only visited (touched) and not read, the class file would be copied but it's size would be zero. Case 2 : If a class file was read, even one byte, the entire class would be copied. Case 3 : If a class was neither touched for read, it will not be copied. Case 4: For native libraries, anything which was loaded was copied to the destination.
  9. This application folder had only the javafx ui app itself and only those bootstrap classes which were actually used. I test it, and it ran successfully. I zipped it, and found the size was as small as 5MB.

Back in 2015 I shared this idea with RoboVM guys. Here is the link to the discussion https://groups.google.com/d/msg/robovm/-LEeLkGJodA/qFGwVfKQm3QJ Niklas Therning (founder of robovm) had found this interesting and had said,

Interesting approach! :-) We're working on improving the stripping done by RoboVM to reduce file sizes. Recording which classes are actually used at runtime is something we could do easily by patching RoboVM slightly. We're currently looking into an approach which is much less aggressive, using static analyses. One nice advantage with the dynamic approach is no special handling is required for classes loaded via reflection. Maybe we could use this for generating forceLinkClasses patterns automatically for users. The drawback is of course that you have to make sure you touch all codepaths of your app when recording.

Thanks for the info and links. We'll see where we end up eventually...

However, soon after the company was sold and then came Xamarin.

Much later, GluonVM picked it up, and then later Gluon dropped it own VM and started using GraalVM and only very recently it has started giving tools to create GraalVM powered binaries which even an average developer like me can use to build and run my javafx applications on mobiles (android, iphone) and desktop, everywhere.

So I felt it is time I could raise this matter again. And as already pointed out, such approaches would make GraalVm extremely competitive compared to Go-lang etc. also.

To be honest, I don't know how much optimization has been already implemented and put in place in GraalVM. GraalVM is amazing no doubt and performance difference is clearly felt from end user experience point of view, no doubt.

I might be over expecting, but I feel, if this size issue/feature is cracked, GraalVM can replace every language/platform/runtime in the world, as the first default choice.

So to give a summary, the idea/suggestion is

  • Apart from static analysis, (optionally) recording which classes are actually used at runtime, both for the runtime (jvm) bootstrap and the application.
  • Keep only the classes which are actually used, remove classes which were never used both from bootstrap and the end application.

Please let me know your thoughts.

Thank you

BTW to additionally mention, I had packaged youtube-dl a python app, with a full python runtime environment (stripped) not more than 3MB (after compression).

nithyasharabheshwara avatar May 20 '20 08:05 nithyasharabheshwara

That's an interesting comment. I never used SpyFS but maybe it can help here. Does that work on class level, or method level?

Getting a JavaFX app under 5 MB sounds very challenging. Did that include the native libraries (e.g. libglass, libprism_es2 etc?).

johanvos avatar Jun 04 '20 11:06 johanvos

It is possible we make a 400 kB "Hello, World!" (@pejovica did this). But this code is completely unsafe and insecure and can lead to segfaults. This could be made as an experimental feature with a strong emphasis on experimental (use at your own risk). For making it a feature, we would need a very strong use-case.

vjovanov avatar Jun 04 '20 12:06 vjovanov

That's an interesting comment. I never used SpyFS but maybe it can help here. Does that work on class level, or method level?

Getting a JavaFX app under 5 MB sounds very challenging. Did that include the native libraries (e.g. libglass, libprism_es2 etc?).

Hey sorry, my apologies, I didn't notice your question. So SpyFS neither works at the class level nor at the method level. It works at the filesystem level. All the runtime and bootstrap classes are extracted in a folder and this custom bootstrap class bundle is used instead of the default java runtime classes using the Xbootclasspath flag. This folder is spied by SpyFS and it knows exactly which classes were actually read (opened and >0 bytes read), which were accessed (opened but zero bytes read) and which class files were not opened at all.

Then SpyFS data is used to make a duplicate of this custom bootstrap class bundle in another folder. All the classes which were read ( > 0 bytes) and copied completely, all class files which were opened but not read (total read bytes = 0) are copied like dummy class files of zero size, all class files which were neither read nor opened are not copied. This basically forms the stripped-down runtime bootstrap class bundle for that particular application. It tried it like 5 years ago, and haven't had the opportunity to replicate it, however. The old 2015 example I am not able to run anyway, so probably some native libraries I am guessing it must have been pulling out from somewhere.

Now to answer the question regarding the native libraries (e.g. libglass, libprism_es2 etc?), yes it included all of them. During the runtime which libraries are actually loaded and used was separately analyzed and all those libraries were copied and used.

I hope I was able to explain the approach. It was a very raw method I can say. Because I had made my own kernel filesystem library (binding) in java, I was able to get this done easily.

nithyasharabheshwara avatar Jun 16 '20 12:06 nithyasharabheshwara

Hey, same problem here, I'm working on small CLI app , the only dependency I have is Jline3 but the final executable weights 14 MB, how could I decrease the size ? (The same app in Golang takes 3 MB). I use Java 11

openjdk version "11.0.6" 2020-01-14
OpenJDK Runtime Environment GraalVM CE 20.0.0 (build 11.0.6+9-jvmci-20.0-b02)
OpenJDK 64-Bit Server VM GraalVM CE 20.0.0 (build 11.0.6+9-jvmci-20.0-b02, mixed mode, sharing)

strogiyotec avatar Jun 27 '20 04:06 strogiyotec

I had a look into the size of the generated binary for a hello world main with objdump -x:

`objdump -x` output
Sections:
Idx Name          Size      VMA               LMA               File off  Algn
  0 .interp       0000001c  00000000000002a8  00000000000002a8  000002a8  2**0
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  1 .note.gnu.build-id 00000024  00000000000002c4  00000000000002c4  000002c4  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  2 .note.ABI-tag 00000020  00000000000002e8  00000000000002e8  000002e8  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  3 .gnu.hash     000001c0  0000000000000308  0000000000000308  00000308  2**3
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  4 .dynsym       00000de0  00000000000004c8  00000000000004c8  000004c8  2**3
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  5 .dynstr       00000e8a  00000000000012a8  00000000000012a8  000012a8  2**0
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  6 .gnu.version  00000128  0000000000002132  0000000000002132  00002132  2**1
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  7 .gnu.version_r 000000e0  0000000000002260  0000000000002260  00002260  2**3
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  8 .rela.dyn     0001ebd0  0000000000002340  0000000000002340  00002340  2**3
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  9 .rela.plt     000005a0  0000000000020f10  0000000000020f10  00020f10  2**3
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
 10 .init         0000001b  0000000000022000  0000000000022000  00022000  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
 11 .plt          000003d0  0000000000022020  0000000000022020  00022020  2**4
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
 12 .plt.got      00000008  00000000000223f0  00000000000223f0  000223f0  2**3
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
 13 .text         002c06a3  0000000000023000  0000000000023000  00023000  2**12
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
 14 .fini         0000000d  00000000002e36a4  00000000002e36a4  002e36a4  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
 15 .rodata       000095a3  00000000002e4000  00000000002e4000  002e4000  2**12
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
 16 .svm_heap     0038c9c0  00000000002ee000  00000000002ee000  002ee000  2**12
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
 17 .eh_frame_hdr 0000027c  000000000067a9c0  000000000067a9c0  0067a9c0  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
 18 .eh_frame     00000c50  000000000067ac40  000000000067ac40  0067ac40  2**3
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
 19 .init_array   00000010  000000000067cb88  000000000067cb88  0067bb88  2**3
                  CONTENTS, ALLOC, LOAD, DATA
 20 .fini_array   00000008  000000000067cb98  000000000067cb98  0067bb98  2**3
                  CONTENTS, ALLOC, LOAD, DATA
 21 .dynamic      00000230  000000000067cba0  000000000067cba0  0067bba0  2**3
                  CONTENTS, ALLOC, LOAD, DATA
 22 .got          00000230  000000000067cdd0  000000000067cdd0  0067bdd0  2**3
                  CONTENTS, ALLOC, LOAD, DATA
 23 .data         000019ec  000000000067d000  000000000067d000  0067c000  2**12
                  CONTENTS, ALLOC, LOAD, DATA
 24 .bss          00000188  000000000067e9f0  000000000067e9f0  0067d9ec  2**3
                  ALLOC
 25 .comment      00000046  0000000000000000  0000000000000000  0067d9ec  2**0
                  CONTENTS, READONLY
 26 .debug_aranges 000002b0  0000000000000000  0000000000000000  0067da32  2**0
                  CONTENTS, READONLY, DEBUGGING, OCTETS
 27 .debug_info   0000501a  0000000000000000  0000000000000000  0067dce2  2**0
                  CONTENTS, READONLY, DEBUGGING, OCTETS
 28 .debug_abbrev 00000647  0000000000000000  0000000000000000  00682cfc  2**0
                  CONTENTS, READONLY, DEBUGGING, OCTETS
 29 .debug_line   000008ad  0000000000000000  0000000000000000  00683343  2**0
                  CONTENTS, READONLY, DEBUGGING, OCTETS
 30 .debug_str    00002c1d  0000000000000000  0000000000000000  00683bf0  2**0
                  CONTENTS, READONLY, DEBUGGING, OCTETS
 31 .debug_loc    00001eb6  0000000000000000  0000000000000000  0068680d  2**0
                  CONTENTS, READONLY, DEBUGGING, OCTETS
 32 .debug_ranges 00000350  0000000000000000  0000000000000000  006886c3  2**0
                  CONTENTS, READONLY, DEBUGGING, OCTETS

The full binary has 6866528 bytes. The biggest contributors to that size are the .text section with the compiled code of 2885283 bytes (42%) and the .svm_heap section with 3721664 bytes (54%).

@vjovanov already commented about the size of the unused code that was included. However, since the initial native heap seems to be even quite a bit bigger than that, it would be interesting to understand why that is the case and what's in there.

-H:+PrintHeapHistogram will print a histogram of the data in the heap:

abridged `-H:+PrintHeapHistogram` output
=== Summary ===
DynamicHub; 5821; 487376
ImageCodeInfo; 10; 868104
Other; 47113; 2362952
Total; 52944; 3718432

[switched sections around]

=== DynamicHub ===
   Count     Size   Size%    Cum% Class
    1455   314968  64.63%  64.63% java.lang.Class
    1456    87312  17.91%  82.54% byte[]
    1455    46560   9.55%  92.09% java.lang.String
    1455    38536   7.91% 100.00% int[]

=== ImageCodeInfo ===
   Count     Size   Size%    Cum% Class
       5   837632  96.49%  96.49% byte[]
       1    22064   2.54%  99.03% java.lang.String[]
       1     8240   0.95%  99.98% java.lang.Class[]
       1      112   0.01%  99.99% com.oracle.svm.core.code.ImageCodeInfo
       2       56   0.01% 100.00% java.lang.Object[]

=== Other ===
   Count     Size   Size%    Cum% Class
   13210   643456  27.23%  27.23% byte[]
   12855   411360  17.41%  44.64% java.lang.String
    5488   219520   9.29%  53.93% java.util.HashMap$Node
     270   148368   6.28%  60.21% char[]
     355   109744   4.64%  64.85% java.lang.String[]
      96    95376   4.04%  68.89% java.util.HashMap$Node[]
    1474    94336   3.99%  72.88% sun.util.locale.LocaleObjectCache$CacheEntry
    1516    84896   3.59%  76.47% java.util.concurrent.ConcurrentHashMap$Node
    1325    84800   3.59%  80.06% java.util.LinkedHashMap$Entry
     468    55248   2.34%  82.40% int[]
[snip]
  • Can parts of those heap parts be stripped?
  • Is there a way to create a heap dump for those to analyze roots? (-H:DumpHeap seems to dump the heap of the native-image process but not the native-heap)

jrudolph avatar Aug 15 '20 12:08 jrudolph

@jrudolph this is an interesting analysis. 3721664 seems indeed big and we should investigate what takes that much. By looking at the output I would say:

  1. byte[] takes the most space. We should really see where this data originates and can we shrink it before building an image.
  2. What are the 17% of the strings in the image heap?
  3. Data structures seem to take quite-some space (e.g., HashMaps). We should see if we minimized those data structures before building an image?
  4. DynamicHub is significant in the image heap. We could maybe use a bitset for the boolean flags there. Potentially, we could also encode the class name in a more efficient form.

-H:DumpHeap is the best I see. I think you can quickly identify what comes from the image builder. For anything better, we would have to implement our own version of hosted heap dumping that accounts only for the image heap.

vjovanov avatar Aug 17 '20 10:08 vjovanov

I refactored one of Real World app from Spring Boot to Quarkus/Panache. That apps are usual micro-services. In my case with PostgreSQL DB, JWT security and RESTful API. You can check different real world apps here: https://github.com/gothinkster/realworld

My Quarkus app has Uber jar 43Mb and native linux binary is 82.5Mb! The similar Go app has just 16Mb

5 time thinner!

Is it because native build do not remove all unused classes and methods and every new jar dependency will just add own size to the final binary? Even if it's true I can't realize why resulting native binary is 2 times bigger that fat jar which contains all classes?

May be that is because some testing/debug/diagnose/non-prod option is turned on by default?

Is there any ways or plans to do some analysis and do not include the unused code or any other redundant stuff? Thanks

oleksandr-ilin avatar Sep 20 '20 21:09 oleksandr-ilin

Is it because native build do not remove all unused classes and methods and every new jar dependency will just add own size to the final binary? Even if it's true I can't realize why resulting native binary is 2 times bigger that fat jar which contains all classes?

On this point specifically: consider that the native binary is including the whole of all JDK classes and Substrate, the "JVM" runtime. The "fat jar" only includes your application code and its dependencies, so you would need to add the size of the JDK for a fair comparison.

A good way to compare is via the (full) disk size of a docker image: in the case of native-image make you can wrap an empty image, while the one with the JDK will need not only the JDK but also the shared libraries to which it depends on.

That said, it's of course still interesting to try to get closer to what Go is able to - Just bear in mind that the code is possibly different, such as the Java libraries being much more mature and feature rich, they are likely to need more code to be included.

Sanne avatar Sep 21 '20 08:09 Sanne

@vjovanov in Quarkus we make sure many immutable structures that frameworks needs are initialized as a constant during compilation, so for example many such String and HashMap are "ready to go" and guaranteed immutable.

I also noticed these take quite some space; I even had the impression Strings are not de-duplicated - I didn't have time to dig further into detail, but if someone wanted to pursue this I suspect there could be some quick and easy wins via:

  • de-duplicating all String constants being included in the binary
  • converting all constant (immutable) instances of HashMap and similar into a compact, read-only struct?

I would expect this could also give some good performance boosts: much of our code will read those maps extremely often.

I did obtain a minor win by de-duplicating some String instances during bootstrap of the Hibernate ORM metadata; that's why I think de-duplication isn't happening in GraalVM's constant pool - but I might be wrong.

Sanne avatar Sep 21 '20 09:09 Sanne

Just a quick note on de-duplication: one would need to be sure that objects subject to de-duplication/converting are never synchronized or have their identity used.

dougxc avatar Sep 21 '20 09:09 dougxc

@dougxc great point, I hadn't thought of that. Regarding - specifically - Strings, I think we can all agree that people should never do this, but I agree it could still be a thing to consider. Perhaps the safe option would be to de-duplicate the underlying byte array? Some GC implementations do this at runtime, so one could expect to trigger the same process before "casting it all in stone" in the binary.

Sanne avatar Sep 21 '20 10:09 Sanne

@Sanne

consider that the native binary is including the whole of all JDK classes and Substrate, the "JVM" runtime.

That is not clear for me. I thought one of the purpose to have the new separate VM like Substrate was actually to have ability do not bring ALL JDK classes and unused stuff into the native binary. So basically having AOT we can do static analysis and remove everything unused and that why we have so long build process for native build, I thought. Similarly like C LINK links exe and picks up only used functions from the libs. More closer to the Java world is well known ProGuard (https://www.guardsquare.com/en/products/proguard). So I thought it is completely feasible.

BTW After all Go requires similar runtime and GC to do the job...

A good way to compare is via the (full) disk size of a docker image: in the case of native-image make you can wrap an empty image, while the one with the JDK will need not only the JDK but also the shared libraries to which it depends on.

Yes that was exactly I did.

REPOSITORY                                    TAG                 IMAGE ID            CREATED             SIZE                SHARED SIZE         UNIQUE SIZE         CONTAINERS
quarkus/real-world-app                        latest              822d99fce996        13 hours ago        105.8MB             17.86MB             87.93MB             1
go/real-world-app                             latest              356e06f919fa        15 hours ago        21.89MB             5.575MB             16.31MB             0

As you may see the SHARED SIZE is something like Alpine or ubi-minimal and here we can play little bit. Here you can see the image for Go was better than ubi-minimal used for quarkus but there could be found similar alternatives for quarkus. However the UNIQUE_SIZE is exactly the binary artifact size. 16M for Go and 88M for Quarkus build artifact. And that is major parts of full container size.

they are likely to need more code to be included.

That's actually scare me and why I'm asking :-)

oilin-clgx avatar Sep 21 '20 10:09 oilin-clgx

The thing that has not been mentioned yet is that much of the image size is contributed by the static OpenJDK static libraries that are now linked into every native image. These cannot be pruned during Java code analysis to remove unwanted code or data because they are not Java code.

In earlier versions of GraalVM Native the behaviour provided by the OpenJDK static libs was reimplemented as pure Java code and most of it was subsequently optimized out of the generated binary, giving sizes much closer to that of equivalent Go programs. However, maintaining all that re-implemented functionality across multiple JDK versions was determined to be pointless effort for little gain so the OpenJDK libs are now used instead.

Note carefully that last qualification. The redundant code and data which are linked into these libraries will not be referenced at runtime. So, it will make very little contribution to text or data segment pages in the running image i.e. the overhead you are so concerned about is essentially going to manifest as little more than some extra storage on disk. I know that's a cost but disk is very, very cheap.

If you really care about saving some few 10s of megabytes of disk space in your deployed container well then write your app in Go (including writing a great deal of the standard Java lib functionality you are going to need to implement and test and train your programmers to use). If not then stop comparing disk image sizes and start measuring the resident memory costs that will actualy affect your bottom line.

adinn avatar Sep 21 '20 10:09 adinn

@adinn If the problem is static OpenJDK static libraries that are now linked into every native image, should it be some constant value for any size of app? According to the 1st post the Hello World app is ~6.1 MiB so logically it should be added not more than 6+Mb. Rest should be your stuff? How 43Mb in jar becomes almost 90Mb in the native executable?

Also disk space in your deployed container is not only issue. The prices for the traffic, time to download, install, startup time etc could make sense as well. Especially for the niche where this technology expected to be used well, like microservice horizontal scaling on thousands of VMs.

oilin-clgx avatar Sep 21 '20 11:09 oilin-clgx

The thing that has not been mentioned yet is that much of the image size is contributed by the static OpenJDK static libraries that are now linked into every native image. These cannot be pruned during Java code analysis to remove unwanted code or data because they are not Java code

@adinn for Linux we compile the static libs with -ffunction-sections -fdata-sections. If the image is built with -H:+RemoveUnusedSymbols (default on Linux) the native linker command makes use of -Wl,--gc-sections. While this is not as effective having the code available as Java code it can still remove bits of the static libs that are not referenced anywhere at image link-time.

olpaw avatar Sep 21 '20 12:09 olpaw

@adinn If the problem is static OpenJDK static libraries that are now linked into every native image, should it be some constant value for any size of app?

It would be if all the libs were always linked in. I'm not sure if that is the case.

According to the 1st post the Hello World app is ~6.1 MiB so logically it should be added not more than 6+Mb. Rest should be your stuff?

The libs provide code needed for various native methods e.g. io, maths functions etc. So, selective inclusion of libs according to which JDK classes get linked in may account for the disparity.

How 43Mb in jar becomes almost 90Mb in the native executable?

Jar sizes are a completely specious metric against which to compare executable size.

Firstly, the sizes are only very loosely coupled. Most of the content of classes in jar files is Symbols, Strings and numeric Constants (it's usually > 90%). Many of these are repeated across a large number of classes so they end up occupying a much tinier amount of space when they are deduplicated to a single Symbol, String or Constant. How much deduplication arises will depend on how much replication there is. So, there is no fixed divisor to apply. So, if you are seeing 90Mb of executable then that may possibly represent a large amount of Java String data in your heap but that would only be because many different Strings occur in that 43Mb of jar code. Other 43Mb jars might contain only a handful of unique Strings.

Secondly, most Symbols and many Strings and Constants can be omitted from the image because the analysis shows they are not needed. Symbols are rarely needed anyway so it is mostly Strings and numeric constants that will add to image size. How much they add, after deduplication, really depends on how many of the classes in the jars are actually referenced by the app. If clases methdos or fields are not used then GrallVM does not include them in th eimage. Once again that depends entirely on how the code in the jar is written in the first place plus what use client code makes of those classes. A 43Mb jar might end up contributing once class and a few methods or hundreds of classes and methods. So, I am sorry but the numbers you are quoting really don't corroborate your story about GraalVM being inefficient. It's more complicated than that.

Also disk space in your deployed container is not only issue. The prices for the traffic, time to download, install, startup time etc could make sense as well. Especially for the niche where this technology expected to be used well, like microservice horizontal scaling on thousands of VMs.

Startup time is another red herring. If OpenJDK library code is not invoked then it won't slow you down having it in your disk image (you might possibly see slightly worse paging of the text section but thta's going to be micro effect).

Perhaps download time and costs are significant for you relative to development and maintenance costs. I find that unlikely but I cannot rule it out. As I said, do switch to Go if it suits your needs better. I am just pointing out that 1) this is not a one-way street but a trade-off and 2) your assumptions about where the costs and opportunities/need for improvement lie were incomplete and missing important elements.

adinn avatar Sep 21 '20 12:09 adinn

Kotlin native has about a 500K for a Hello World without debug? How do they do it?

Edit: Upon further inspection, going To this reddit thread, you will see a comment from the Kotlin Team, saying they are not competing with each other, but provide two different types of use cases.

Perhaps you might want to go to Kotlin Native? As it also can use Java Jar's too, right?

cyraid avatar Mar 28 '21 07:03 cyraid

upx has been mentioned in other threads as well, and I don't mind the large file size.

native-image is pretty nice, works well (at the least so far that I have used) and is fast. Storage is almost never a bottleneck IMO on modern computer systems. Perhaps on embedded, but ... I here have a cheap 3TB harddisc and that's already several years old. I think storage-size wise all is fine.

Still, small is beautiful, and perhaps the GraalVM team could consider integrating either upx, or something similar to upx, with that specific goal (reduce file size) and perhaps make it available via some commandline variant too such as --small or something like that. That way we could skip another extra step. Right now I have to go to the upx homepage, download this, install it and hope that it works. A commandline flag by default in native-image would be more convenient though.

I'll explore upx but hopefully the GraalVM team considers this here, even if the issue is +3 years old - I still think, even if not hugely important, small file size CAN be useful (for instance, for downloads too, on any area of the world where you can only download slowly, so that would be one use case; I am sure you can think of many more use cases where that may seem useful, even if on modern systems file size really very rarely is any bottleneck as such).

cyraid mentioned kotlin, and that's a fine comment, but I would like to add that one big sell of GraalVM is kinda the "use any programming language". Ok ok not every language works, I get it ;) but if you go from this point of view then I think no individual language should necessarily be put "above" the other languages, usage-wise. I get that kotlin is closer to java than the others, but I have a ruby background, I am sure others have a python background, others a javascript background etc ... - so ideally the "polyglot" focus should put these languages on the "same" level whenever possible. I agree with him in regards to the hello world example - as said, it's not any issue for me, but the "helloworld" binary native-image generated here has 15MB. I'll see to chop off stuff via upx soon, but the GraalVM team should take that into consideration and see how much they could also omit, if that is possible too. 15MB seems a bit much - is that all really necessary 1:1? I understand the issue is not about the text output "hello world", but the associated tooling, but even then it's kind of much, in my opinion. But, it's not such a big deal anyway, just something to keep in mind for the future, IMO.

rubyFeedback avatar Dec 29 '21 23:12 rubyFeedback

UPX is not a solution. Not only is it an external compressor that has nothing to do with the JVM but executable compression always adds measurable time to decompression, which means the java natives will take longer to startup - greatly diminishing one of the main use cases for native compiled java applets.

UPX is widely known about, anybody who knows anything about compression will be familiar with it; its not necessary to pollute GraalVM build system with another dependency that users can easily find and plugin themselves. Size is important, but not at the expense of any performance. UPX is a band-aid, not a solution.

One of the main attractions for native executables is embedded systems, where space AND performance are a premium. If you want to design a KIOSK system for example, you always needed to bundle a full JRE with them, which makes deployment more complicated and adds another layer of vulnerability. So it's important that any executable size improvements have zero cost to performance, otherwise what's the point - just use a JRE and get all that advanced JIT and GC goodness tuned up.

We all need to remember that this is a pretty crazy project - it can take practically any existing Java code since forever and remove the VM from it, making it run natively. In my opinion, it's pretty amazing that the executables are already this small!

Hopefully someone figures out something, but I honestly wouldn't be surprised if this is the best we can get without leaving Java behind. I don't mind the executable size, personally - I've worked around it by using one executable with many entry points rather than compiling many individual executables.

EDIT: If you are using Java and don't need polygot in Graal native exe's, consider IBM's Quarkus/Mandrel for smaller exe's (it is a fork of Graal VM): https://quarkus.io/guides/building-native-image - though it is container based so yeah, not as simple.

cosmicdan avatar Jan 01 '22 00:01 cosmicdan

Info on UPX: tried UPX on native image GraalVM Hello World app (64-bit Windows) and the UPX compressed EXE does not work (does not print Hello World).

gocursor avatar Aug 06 '22 21:08 gocursor