graal
graal copied to clipboard
Reduce size of output executable
Compiling hello world with substrate vm on ubuntu results in a 6.1 MiB executable. Is it possible to reduce this? The equivalent in golang is 1.6 MiB or < 1 MiB without debug information.
True, I have evaluated the image size and even for an empty main program we get ~5MB of an image. There are a few reasons for that:
- In our features we use JDK code that has non-negligable footprint. To see all kinds of things that get pulled in you can add
-H:+PrintUniverseto the image build. - Some of our features are included into the image although they are never used in the code.
- The points-to analysis is imprecise and sometimes catches elements that are never used.
On the bright side, if you include much of your code the 5MB overhead will remain the same. So this is an issue only for very small images. This is a great issue. If you have a need for small images in your use-case, please mention it here and we will raise the priority.
unused-pkgs-hw.txt unused-classes-hw.txt unused-methods-hw.txt
These are the packages, classes, and methods that are never invoked. They can use as an indicator for elements that should not be in the image. Some things like the heap package must be included into the image, although for this particular program they are never used.
@pejovica thanks for the data.
The general use-case is to remove a common argument for people to use Go-lang. One specific use-case that this would severely impact is something like implementing many small command line utilities as in Linux.
Does that 5 MiB include the GC? At least for simple things like helloworld you can prove you don't need a GC.
It does, but by looking at the list of included elements, I would not say that GC is the biggest problem. I would rather invest that time to remove things that should not be there by any means. For example, org.graalvm.compiler.truffle, java.util.zip, java.util.regex, java.util.Calendar.
By removing these I am confident that we can reach the size of the GOs "Hello, Word!". At one point we removed all methods that were never executed and the image size was 400 KB. This is the lower bound of course, but could be used as a guideline of what we should reach.
Thanks @vjovanov, that would be amazing!
You can also use https://upx.github.io/ as a temporary solution to make compressed binaries. Reduces the size by a lot in my experience.
Any thoughts on this, guys?
I'm targeting Graalvm as the (probably/hopefully) the solution for the long cold-starts in AWS Lambda functions written in Java. Smaller binaries would make our deployments faster. Also, AWS has some limits on deployment size, I'm afraid that binaries would become too big if we have multiple dependencies in our project - which is usually the case when using AWS SDK.
I think that's a game changer functionality that would make JVM more attractive to the community, especially those who have been flirting with Go and Rust as an alternative.
(not issue relevant) @miere , already discovered https://quad.team/blog/Micronaut-to-AWS-Lamda-guide ?
Any thoughts on this, guys?
I'm targeting Graalvm as the (probably/hopefully) the solution for the long cold-starts in AWS Lambda functions written in Java. Smaller binaries would make our deployments faster. Also, AWS has some limits on deployment size, I'm afraid that binaries would become too big if we have multiple dependencies in our project - which is usually the case when using AWS SDK.
I think that's a game changer functionality that would make JVM more attractive to the community, especially those who have been flirting with Go and Rust as an alternative.
Now that we have GraalVM building against JDK 11, it's only a matter of time until the native compiler can work with the new modularity. I doubt file sizes will ever be improved on JDK 8 though since the class library was very.... let's say "monolithic" before the Project Jigsaw refactor.
So until those native compiler improvements, I suggest updating JDK 8 projects to JDK 11 and making them modular in preparation for that :D
Also, see what CremboC said - UPX is pretty good. ~11MB exe down to ~3MB.
@thomaswue @vjovanov I would like to share an approach I took five years ago (2015) and made custom jvms which were extremely small (A JavaFX UI application with runtime totally to only 5MB (after zipping)).
I used the following to achieve this result
- javafx native packing tool
- spyfs
- Xbootclasspath flag
Steps
- So what I did was, I extracted all runtime/bootstrap classes/jars in a single folder. Not just rt.jar, anything which is used. This was my custom bootstrap classes folder.
- I packaged my application using javafx native packing tool
- I replaced some setting in this, using Xbootclasspath flag so that it picked up classes from the custom bootstrap classes folder.
- I made a virtual clone of this using spyfs.
- I ran the application on this virtual clone.
- SpyFS detected which classes were actually loaded and saved this information.
- SpyFS copied only the classes which were actually loaded into a third folder - application output folder.
- The logic which I used was - Case 1: if a classfile was only visited (touched) and not read, the class file would be copied but it's size would be zero. Case 2 : If a class file was read, even one byte, the entire class would be copied. Case 3 : If a class was neither touched for read, it will not be copied. Case 4: For native libraries, anything which was loaded was copied to the destination.
- This application folder had only the javafx ui app itself and only those bootstrap classes which were actually used. I test it, and it ran successfully. I zipped it, and found the size was as small as 5MB.
Back in 2015 I shared this idea with RoboVM guys. Here is the link to the discussion https://groups.google.com/d/msg/robovm/-LEeLkGJodA/qFGwVfKQm3QJ Niklas Therning (founder of robovm) had found this interesting and had said,
Interesting approach! :-) We're working on improving the stripping done by RoboVM to reduce file sizes. Recording which classes are actually used at runtime is something we could do easily by patching RoboVM slightly. We're currently looking into an approach which is much less aggressive, using static analyses. One nice advantage with the dynamic approach is no special handling is required for classes loaded via reflection. Maybe we could use this for generating forceLinkClasses patterns automatically for users. The drawback is of course that you have to make sure you touch all codepaths of your app when recording.
Thanks for the info and links. We'll see where we end up eventually...
However, soon after the company was sold and then came Xamarin.
Much later, GluonVM picked it up, and then later Gluon dropped it own VM and started using GraalVM and only very recently it has started giving tools to create GraalVM powered binaries which even an average developer like me can use to build and run my javafx applications on mobiles (android, iphone) and desktop, everywhere.
So I felt it is time I could raise this matter again. And as already pointed out, such approaches would make GraalVm extremely competitive compared to Go-lang etc. also.
To be honest, I don't know how much optimization has been already implemented and put in place in GraalVM. GraalVM is amazing no doubt and performance difference is clearly felt from end user experience point of view, no doubt.
I might be over expecting, but I feel, if this size issue/feature is cracked, GraalVM can replace every language/platform/runtime in the world, as the first default choice.
So to give a summary, the idea/suggestion is
- Apart from static analysis, (optionally) recording which classes are actually used at runtime, both for the runtime (jvm) bootstrap and the application.
- Keep only the classes which are actually used, remove classes which were never used both from bootstrap and the end application.
Please let me know your thoughts.
Thank you
BTW to additionally mention, I had packaged youtube-dl a python app, with a full python runtime environment (stripped) not more than 3MB (after compression).
That's an interesting comment. I never used SpyFS but maybe it can help here. Does that work on class level, or method level?
Getting a JavaFX app under 5 MB sounds very challenging. Did that include the native libraries (e.g. libglass, libprism_es2 etc?).
It is possible we make a 400 kB "Hello, World!" (@pejovica did this). But this code is completely unsafe and insecure and can lead to segfaults. This could be made as an experimental feature with a strong emphasis on experimental (use at your own risk). For making it a feature, we would need a very strong use-case.
That's an interesting comment. I never used SpyFS but maybe it can help here. Does that work on class level, or method level?
Getting a JavaFX app under 5 MB sounds very challenging. Did that include the native libraries (e.g. libglass, libprism_es2 etc?).
Hey sorry, my apologies, I didn't notice your question. So SpyFS neither works at the class level nor at the method level. It works at the filesystem level. All the runtime and bootstrap classes are extracted in a folder and this custom bootstrap class bundle is used instead of the default java runtime classes using the Xbootclasspath flag. This folder is spied by SpyFS and it knows exactly which classes were actually read (opened and >0 bytes read), which were accessed (opened but zero bytes read) and which class files were not opened at all.
Then SpyFS data is used to make a duplicate of this custom bootstrap class bundle in another folder. All the classes which were read ( > 0 bytes) and copied completely, all class files which were opened but not read (total read bytes = 0) are copied like dummy class files of zero size, all class files which were neither read nor opened are not copied. This basically forms the stripped-down runtime bootstrap class bundle for that particular application. It tried it like 5 years ago, and haven't had the opportunity to replicate it, however. The old 2015 example I am not able to run anyway, so probably some native libraries I am guessing it must have been pulling out from somewhere.
Now to answer the question regarding the native libraries (e.g. libglass, libprism_es2 etc?), yes it included all of them. During the runtime which libraries are actually loaded and used was separately analyzed and all those libraries were copied and used.
I hope I was able to explain the approach. It was a very raw method I can say. Because I had made my own kernel filesystem library (binding) in java, I was able to get this done easily.
Hey, same problem here, I'm working on small CLI app , the only dependency I have is Jline3 but the final executable weights 14 MB, how could I decrease the size ? (The same app in Golang takes 3 MB). I use Java 11
openjdk version "11.0.6" 2020-01-14
OpenJDK Runtime Environment GraalVM CE 20.0.0 (build 11.0.6+9-jvmci-20.0-b02)
OpenJDK 64-Bit Server VM GraalVM CE 20.0.0 (build 11.0.6+9-jvmci-20.0-b02, mixed mode, sharing)
I had a look into the size of the generated binary for a hello world main with objdump -x:
`objdump -x` output
Sections:
Idx Name Size VMA LMA File off Algn
0 .interp 0000001c 00000000000002a8 00000000000002a8 000002a8 2**0
CONTENTS, ALLOC, LOAD, READONLY, DATA
1 .note.gnu.build-id 00000024 00000000000002c4 00000000000002c4 000002c4 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
2 .note.ABI-tag 00000020 00000000000002e8 00000000000002e8 000002e8 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
3 .gnu.hash 000001c0 0000000000000308 0000000000000308 00000308 2**3
CONTENTS, ALLOC, LOAD, READONLY, DATA
4 .dynsym 00000de0 00000000000004c8 00000000000004c8 000004c8 2**3
CONTENTS, ALLOC, LOAD, READONLY, DATA
5 .dynstr 00000e8a 00000000000012a8 00000000000012a8 000012a8 2**0
CONTENTS, ALLOC, LOAD, READONLY, DATA
6 .gnu.version 00000128 0000000000002132 0000000000002132 00002132 2**1
CONTENTS, ALLOC, LOAD, READONLY, DATA
7 .gnu.version_r 000000e0 0000000000002260 0000000000002260 00002260 2**3
CONTENTS, ALLOC, LOAD, READONLY, DATA
8 .rela.dyn 0001ebd0 0000000000002340 0000000000002340 00002340 2**3
CONTENTS, ALLOC, LOAD, READONLY, DATA
9 .rela.plt 000005a0 0000000000020f10 0000000000020f10 00020f10 2**3
CONTENTS, ALLOC, LOAD, READONLY, DATA
10 .init 0000001b 0000000000022000 0000000000022000 00022000 2**2
CONTENTS, ALLOC, LOAD, READONLY, CODE
11 .plt 000003d0 0000000000022020 0000000000022020 00022020 2**4
CONTENTS, ALLOC, LOAD, READONLY, CODE
12 .plt.got 00000008 00000000000223f0 00000000000223f0 000223f0 2**3
CONTENTS, ALLOC, LOAD, READONLY, CODE
13 .text 002c06a3 0000000000023000 0000000000023000 00023000 2**12
CONTENTS, ALLOC, LOAD, READONLY, CODE
14 .fini 0000000d 00000000002e36a4 00000000002e36a4 002e36a4 2**2
CONTENTS, ALLOC, LOAD, READONLY, CODE
15 .rodata 000095a3 00000000002e4000 00000000002e4000 002e4000 2**12
CONTENTS, ALLOC, LOAD, READONLY, DATA
16 .svm_heap 0038c9c0 00000000002ee000 00000000002ee000 002ee000 2**12
CONTENTS, ALLOC, LOAD, READONLY, DATA
17 .eh_frame_hdr 0000027c 000000000067a9c0 000000000067a9c0 0067a9c0 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
18 .eh_frame 00000c50 000000000067ac40 000000000067ac40 0067ac40 2**3
CONTENTS, ALLOC, LOAD, READONLY, DATA
19 .init_array 00000010 000000000067cb88 000000000067cb88 0067bb88 2**3
CONTENTS, ALLOC, LOAD, DATA
20 .fini_array 00000008 000000000067cb98 000000000067cb98 0067bb98 2**3
CONTENTS, ALLOC, LOAD, DATA
21 .dynamic 00000230 000000000067cba0 000000000067cba0 0067bba0 2**3
CONTENTS, ALLOC, LOAD, DATA
22 .got 00000230 000000000067cdd0 000000000067cdd0 0067bdd0 2**3
CONTENTS, ALLOC, LOAD, DATA
23 .data 000019ec 000000000067d000 000000000067d000 0067c000 2**12
CONTENTS, ALLOC, LOAD, DATA
24 .bss 00000188 000000000067e9f0 000000000067e9f0 0067d9ec 2**3
ALLOC
25 .comment 00000046 0000000000000000 0000000000000000 0067d9ec 2**0
CONTENTS, READONLY
26 .debug_aranges 000002b0 0000000000000000 0000000000000000 0067da32 2**0
CONTENTS, READONLY, DEBUGGING, OCTETS
27 .debug_info 0000501a 0000000000000000 0000000000000000 0067dce2 2**0
CONTENTS, READONLY, DEBUGGING, OCTETS
28 .debug_abbrev 00000647 0000000000000000 0000000000000000 00682cfc 2**0
CONTENTS, READONLY, DEBUGGING, OCTETS
29 .debug_line 000008ad 0000000000000000 0000000000000000 00683343 2**0
CONTENTS, READONLY, DEBUGGING, OCTETS
30 .debug_str 00002c1d 0000000000000000 0000000000000000 00683bf0 2**0
CONTENTS, READONLY, DEBUGGING, OCTETS
31 .debug_loc 00001eb6 0000000000000000 0000000000000000 0068680d 2**0
CONTENTS, READONLY, DEBUGGING, OCTETS
32 .debug_ranges 00000350 0000000000000000 0000000000000000 006886c3 2**0
CONTENTS, READONLY, DEBUGGING, OCTETS
The full binary has 6866528 bytes. The biggest contributors to that size are the .text section with the compiled code of 2885283 bytes (42%) and the .svm_heap section with 3721664 bytes (54%).
@vjovanov already commented about the size of the unused code that was included. However, since the initial native heap seems to be even quite a bit bigger than that, it would be interesting to understand why that is the case and what's in there.
-H:+PrintHeapHistogram will print a histogram of the data in the heap:
abridged `-H:+PrintHeapHistogram` output
=== Summary ===
DynamicHub; 5821; 487376
ImageCodeInfo; 10; 868104
Other; 47113; 2362952
Total; 52944; 3718432
[switched sections around]
=== DynamicHub ===
Count Size Size% Cum% Class
1455 314968 64.63% 64.63% java.lang.Class
1456 87312 17.91% 82.54% byte[]
1455 46560 9.55% 92.09% java.lang.String
1455 38536 7.91% 100.00% int[]
=== ImageCodeInfo ===
Count Size Size% Cum% Class
5 837632 96.49% 96.49% byte[]
1 22064 2.54% 99.03% java.lang.String[]
1 8240 0.95% 99.98% java.lang.Class[]
1 112 0.01% 99.99% com.oracle.svm.core.code.ImageCodeInfo
2 56 0.01% 100.00% java.lang.Object[]
=== Other ===
Count Size Size% Cum% Class
13210 643456 27.23% 27.23% byte[]
12855 411360 17.41% 44.64% java.lang.String
5488 219520 9.29% 53.93% java.util.HashMap$Node
270 148368 6.28% 60.21% char[]
355 109744 4.64% 64.85% java.lang.String[]
96 95376 4.04% 68.89% java.util.HashMap$Node[]
1474 94336 3.99% 72.88% sun.util.locale.LocaleObjectCache$CacheEntry
1516 84896 3.59% 76.47% java.util.concurrent.ConcurrentHashMap$Node
1325 84800 3.59% 80.06% java.util.LinkedHashMap$Entry
468 55248 2.34% 82.40% int[]
[snip]
- Can parts of those heap parts be stripped?
- Is there a way to create a heap dump for those to analyze roots? (
-H:DumpHeapseems to dump the heap of thenative-imageprocess but not the native-heap)
@jrudolph this is an interesting analysis. 3721664 seems indeed big and we should investigate what takes that much. By looking at the output I would say:
byte[]takes the most space. We should really see where this data originates and can we shrink it before building an image.- What are the 17% of the strings in the image heap?
- Data structures seem to take quite-some space (e.g.,
HashMaps). We should see if we minimized those data structures before building an image? - DynamicHub is significant in the image heap. We could maybe use a bitset for the boolean flags there. Potentially, we could also encode the class name in a more efficient form.
-H:DumpHeap is the best I see. I think you can quickly identify what comes from the image builder. For anything better, we would have to implement our own version of hosted heap dumping that accounts only for the image heap.
I refactored one of Real World app from Spring Boot to Quarkus/Panache. That apps are usual micro-services. In my case with PostgreSQL DB, JWT security and RESTful API. You can check different real world apps here: https://github.com/gothinkster/realworld
My Quarkus app has Uber jar 43Mb and native linux binary is 82.5Mb! The similar Go app has just 16Mb
5 time thinner!
Is it because native build do not remove all unused classes and methods and every new jar dependency will just add own size to the final binary? Even if it's true I can't realize why resulting native binary is 2 times bigger that fat jar which contains all classes?
May be that is because some testing/debug/diagnose/non-prod option is turned on by default?
Is there any ways or plans to do some analysis and do not include the unused code or any other redundant stuff? Thanks
Is it because native build do not remove all unused classes and methods and every new jar dependency will just add own size to the final binary? Even if it's true I can't realize why resulting native binary is 2 times bigger that fat jar which contains all classes?
On this point specifically: consider that the native binary is including the whole of all JDK classes and Substrate, the "JVM" runtime. The "fat jar" only includes your application code and its dependencies, so you would need to add the size of the JDK for a fair comparison.
A good way to compare is via the (full) disk size of a docker image: in the case of native-image make you can wrap an empty image, while the one with the JDK will need not only the JDK but also the shared libraries to which it depends on.
That said, it's of course still interesting to try to get closer to what Go is able to - Just bear in mind that the code is possibly different, such as the Java libraries being much more mature and feature rich, they are likely to need more code to be included.
@vjovanov in Quarkus we make sure many immutable structures that frameworks needs are initialized as a constant during compilation, so for example many such String and HashMap are "ready to go" and guaranteed immutable.
I also noticed these take quite some space; I even had the impression Strings are not de-duplicated - I didn't have time to dig further into detail, but if someone wanted to pursue this I suspect there could be some quick and easy wins via:
- de-duplicating all String constants being included in the binary
- converting all constant (immutable) instances of
HashMapand similar into a compact, read-only struct?
I would expect this could also give some good performance boosts: much of our code will read those maps extremely often.
I did obtain a minor win by de-duplicating some String instances during bootstrap of the Hibernate ORM metadata; that's why I think de-duplication isn't happening in GraalVM's constant pool - but I might be wrong.
Just a quick note on de-duplication: one would need to be sure that objects subject to de-duplication/converting are never synchronized or have their identity used.
@dougxc great point, I hadn't thought of that. Regarding - specifically - Strings, I think we can all agree that people should never do this, but I agree it could still be a thing to consider. Perhaps the safe option would be to de-duplicate the underlying byte array? Some GC implementations do this at runtime, so one could expect to trigger the same process before "casting it all in stone" in the binary.
@Sanne
consider that the native binary is including the whole of all JDK classes and Substrate, the "JVM" runtime.
That is not clear for me. I thought one of the purpose to have the new separate VM like Substrate was actually to have ability do not bring ALL JDK classes and unused stuff into the native binary. So basically having AOT we can do static analysis and remove everything unused and that why we have so long build process for native build, I thought.
Similarly like C LINK links exe and picks up only used functions from the libs.
More closer to the Java world is well known ProGuard (https://www.guardsquare.com/en/products/proguard). So I thought it is completely feasible.
BTW After all Go requires similar runtime and GC to do the job...
A good way to compare is via the (full) disk size of a docker image: in the case of native-image make you can wrap an empty image, while the one with the JDK will need not only the JDK but also the shared libraries to which it depends on.
Yes that was exactly I did.
REPOSITORY TAG IMAGE ID CREATED SIZE SHARED SIZE UNIQUE SIZE CONTAINERS
quarkus/real-world-app latest 822d99fce996 13 hours ago 105.8MB 17.86MB 87.93MB 1
go/real-world-app latest 356e06f919fa 15 hours ago 21.89MB 5.575MB 16.31MB 0
As you may see the SHARED SIZE is something like Alpine or ubi-minimal and here we can play little bit. Here you can see the image for Go was better than ubi-minimal used for quarkus but there could be found similar alternatives for quarkus.
However the UNIQUE_SIZE is exactly the binary artifact size. 16M for Go and 88M for Quarkus build artifact.
And that is major parts of full container size.
they are likely to need more code to be included.
That's actually scare me and why I'm asking :-)
The thing that has not been mentioned yet is that much of the image size is contributed by the static OpenJDK static libraries that are now linked into every native image. These cannot be pruned during Java code analysis to remove unwanted code or data because they are not Java code.
In earlier versions of GraalVM Native the behaviour provided by the OpenJDK static libs was reimplemented as pure Java code and most of it was subsequently optimized out of the generated binary, giving sizes much closer to that of equivalent Go programs. However, maintaining all that re-implemented functionality across multiple JDK versions was determined to be pointless effort for little gain so the OpenJDK libs are now used instead.
Note carefully that last qualification. The redundant code and data which are linked into these libraries will not be referenced at runtime. So, it will make very little contribution to text or data segment pages in the running image i.e. the overhead you are so concerned about is essentially going to manifest as little more than some extra storage on disk. I know that's a cost but disk is very, very cheap.
If you really care about saving some few 10s of megabytes of disk space in your deployed container well then write your app in Go (including writing a great deal of the standard Java lib functionality you are going to need to implement and test and train your programmers to use). If not then stop comparing disk image sizes and start measuring the resident memory costs that will actualy affect your bottom line.
@adinn If the problem is static OpenJDK static libraries that are now linked into every native image, should it be some constant value for any size of app?
According to the 1st post the Hello World app is ~6.1 MiB so logically it should be added not more than 6+Mb. Rest should be your stuff?
How 43Mb in jar becomes almost 90Mb in the native executable?
Also disk space in your deployed container is not only issue. The prices for the traffic, time to download, install, startup time etc could make sense as well.
Especially for the niche where this technology expected to be used well, like microservice horizontal scaling on thousands of VMs.
The thing that has not been mentioned yet is that much of the image size is contributed by the static OpenJDK static libraries that are now linked into every native image. These cannot be pruned during Java code analysis to remove unwanted code or data because they are not Java code
@adinn for Linux we compile the static libs with -ffunction-sections -fdata-sections. If the image is built with -H:+RemoveUnusedSymbols (default on Linux) the native linker command makes use of -Wl,--gc-sections. While this is not as effective having the code available as Java code it can still remove bits of the static libs that are not referenced anywhere at image link-time.
@adinn If the problem is static OpenJDK static libraries that are now linked into every native image, should it be some constant value for any size of app?
It would be if all the libs were always linked in. I'm not sure if that is the case.
According to the 1st post the Hello World app is ~6.1 MiB so logically it should be added not more than 6+Mb. Rest should be your stuff?
The libs provide code needed for various native methods e.g. io, maths functions etc. So, selective inclusion of libs according to which JDK classes get linked in may account for the disparity.
How 43Mb in jar becomes almost 90Mb in the native executable?
Jar sizes are a completely specious metric against which to compare executable size.
Firstly, the sizes are only very loosely coupled. Most of the content of classes in jar files is Symbols, Strings and numeric Constants (it's usually > 90%). Many of these are repeated across a large number of classes so they end up occupying a much tinier amount of space when they are deduplicated to a single Symbol, String or Constant. How much deduplication arises will depend on how much replication there is. So, there is no fixed divisor to apply. So, if you are seeing 90Mb of executable then that may possibly represent a large amount of Java String data in your heap but that would only be because many different Strings occur in that 43Mb of jar code. Other 43Mb jars might contain only a handful of unique Strings.
Secondly, most Symbols and many Strings and Constants can be omitted from the image because the analysis shows they are not needed. Symbols are rarely needed anyway so it is mostly Strings and numeric constants that will add to image size. How much they add, after deduplication, really depends on how many of the classes in the jars are actually referenced by the app. If clases methdos or fields are not used then GrallVM does not include them in th eimage. Once again that depends entirely on how the code in the jar is written in the first place plus what use client code makes of those classes. A 43Mb jar might end up contributing once class and a few methods or hundreds of classes and methods. So, I am sorry but the numbers you are quoting really don't corroborate your story about GraalVM being inefficient. It's more complicated than that.
Also disk space in your deployed container is not only issue. The prices for the traffic, time to download, install, startup time etc could make sense as well. Especially for the niche where this technology expected to be used well, like microservice horizontal scaling on thousands of VMs.
Startup time is another red herring. If OpenJDK library code is not invoked then it won't slow you down having it in your disk image (you might possibly see slightly worse paging of the text section but thta's going to be micro effect).
Perhaps download time and costs are significant for you relative to development and maintenance costs. I find that unlikely but I cannot rule it out. As I said, do switch to Go if it suits your needs better. I am just pointing out that 1) this is not a one-way street but a trade-off and 2) your assumptions about where the costs and opportunities/need for improvement lie were incomplete and missing important elements.
Kotlin native has about a 500K for a Hello World without debug? How do they do it?
Edit: Upon further inspection, going To this reddit thread, you will see a comment from the Kotlin Team, saying they are not competing with each other, but provide two different types of use cases.
Perhaps you might want to go to Kotlin Native? As it also can use Java Jar's too, right?
upx has been mentioned in other threads as well, and I don't mind the large file size.
native-image is pretty nice, works well (at the least so far that I have used) and is fast. Storage is almost never a bottleneck IMO on modern computer systems. Perhaps on embedded, but ... I here have a cheap 3TB harddisc and that's already several years old. I think storage-size wise all is fine.
Still, small is beautiful, and perhaps the GraalVM team could consider integrating either upx, or something similar to upx, with that specific goal (reduce file size) and perhaps make it available via some commandline variant too such as --small or something like that. That way we could skip another extra step. Right now I have to go to the upx homepage, download this, install it and hope that it works. A commandline flag by default in native-image would be more convenient though.
I'll explore upx but hopefully the GraalVM team considers this here, even if the issue is +3 years old - I still think, even if not hugely important, small file size CAN be useful (for instance, for downloads too, on any area of the world where you can only download slowly, so that would be one use case; I am sure you can think of many more use cases where that may seem useful, even if on modern systems file size really very rarely is any bottleneck as such).
cyraid mentioned kotlin, and that's a fine comment, but I would like to add that one big sell of GraalVM is kinda the "use any programming language". Ok ok not every language works, I get it ;) but if you go from this point of view then I think no individual language should necessarily be put "above" the other languages, usage-wise. I get that kotlin is closer to java than the others, but I have a ruby background, I am sure others have a python background, others a javascript background etc ... - so ideally the "polyglot" focus should put these languages on the "same" level whenever possible. I agree with him in regards to the hello world example - as said, it's not any issue for me, but the "helloworld" binary native-image generated here has 15MB. I'll see to chop off stuff via upx soon, but the GraalVM team should take that into consideration and see how much they could also omit, if that is possible too. 15MB seems a bit much - is that all really necessary 1:1? I understand the issue is not about the text output "hello world", but the associated tooling, but even then it's kind of much, in my opinion. But, it's not such a big deal anyway, just something to keep in mind for the future, IMO.
UPX is not a solution. Not only is it an external compressor that has nothing to do with the JVM but executable compression always adds measurable time to decompression, which means the java natives will take longer to startup - greatly diminishing one of the main use cases for native compiled java applets.
UPX is widely known about, anybody who knows anything about compression will be familiar with it; its not necessary to pollute GraalVM build system with another dependency that users can easily find and plugin themselves. Size is important, but not at the expense of any performance. UPX is a band-aid, not a solution.
One of the main attractions for native executables is embedded systems, where space AND performance are a premium. If you want to design a KIOSK system for example, you always needed to bundle a full JRE with them, which makes deployment more complicated and adds another layer of vulnerability. So it's important that any executable size improvements have zero cost to performance, otherwise what's the point - just use a JRE and get all that advanced JIT and GC goodness tuned up.
We all need to remember that this is a pretty crazy project - it can take practically any existing Java code since forever and remove the VM from it, making it run natively. In my opinion, it's pretty amazing that the executables are already this small!
Hopefully someone figures out something, but I honestly wouldn't be surprised if this is the best we can get without leaving Java behind. I don't mind the executable size, personally - I've worked around it by using one executable with many entry points rather than compiling many individual executables.
EDIT: If you are using Java and don't need polygot in Graal native exe's, consider IBM's Quarkus/Mandrel for smaller exe's (it is a fork of Graal VM): https://quarkus.io/guides/building-native-image - though it is container based so yeah, not as simple.
Info on UPX: tried UPX on native image GraalVM Hello World app (64-bit Windows) and the UPX compressed EXE does not work (does not print Hello World).