endless-ships
endless-ships copied to clipboard
boot build fails with OutOfMemoryError: Java heap space
Hi,
Thanks for this, it's a pretty useful tool for me, when trying to figure out which ship I should fly/outfit I should use.
I have a problem, I am trying to build this using the instructions in the README and failing. It's not entirely consistent but the error is always the same, namely lack of Java Heap Space. Pasting below the last 3 runs, edited for brevity
1st try
$ docker build .
Sending build context to Docker daemon 785.9kB
Step 1/13 : FROM clojure:boot-alpine AS boot
[snip]
Retrieving bcprov-jdk15on-1.62.jar from https://repo1.maven.org/maven2/ (4451k)
endless-ships.parser/fn/fn parser.clj: 51
endless-ships.parser/parse parser.clj: 43
/instaparse.core.Parser core.clj: 141
...
instaparse.core/parse core.clj: 47
instaparse.core/parse core.clj: 94
instaparse.gll/parse gll.clj: 979
instaparse.gll/run gll.clj: 449
instaparse.gll/run gll.clj: 463
instaparse.gll/step gll.clj: 445
instaparse.gll/push-message/f gll.clj: 272
instaparse.gll/NodeListener/fn gll.clj: 501
instaparse.gll/push-result gll.clj: 362
instaparse.gll/push-message gll.clj: 280
clojure.core/swap! core.clj: 2262
...
clojure.core/assoc core.clj: 191
...
java.lang.OutOfMemoryError: Java heap space
clojure.lang.ExceptionInfo: Java heap space
line: 47
The command '/bin/sh -c boot build' returned a non-zero code: 1
2nd try
$ docker build .
Sending build context to Docker daemon 785.9kB
Step 1/13 : FROM clojure:boot-alpine AS boot
[snip]
Retrieving bcprov-jdk15on-1.62.jar from https://repo1.maven.org/maven2/ (4451k)
/instaparse.core.Parser core.clj: 141
...
instaparse.core/parse core.clj: 47
instaparse.core/parse core.clj: 94
instaparse.gll/parse gll.clj: 979
instaparse.gll/run gll.clj: 449
instaparse.gll/run gll.clj: 463
instaparse.gll/step gll.clj: 445
instaparse.gll/push-listener/fn gll.clj: 383
instaparse.gll/-parse gll.clj: 136
instaparse.gll/alt-parse gll.clj: 852
instaparse.gll/push-listener gll.clj: 374
instaparse.gll/node-get gll.clj: 320
clojure.core/swap! core.clj: 2262
...
clojure.core/assoc core.clj: 191
...
java.lang.OutOfMemoryError: Java heap space
clojure.lang.ExceptionInfo: Java heap space
line: 47
3rd try
$ docker build .
Sending build context to Docker daemon 785.9kB
Step 1/13 : FROM clojure:boot-alpine AS boot
[snip]
Retrieving bcpkix-jdk15on-1.62.jar from https://repo1.maven.org/maven2/ (850k)
endless-ships.parser/parse parser.clj: 43
/instaparse.core.Parser core.clj: 141
...
instaparse.core/parse core.clj: 47
instaparse.core/parse core.clj: 94
instaparse.gll/parse gll.clj: 979
instaparse.gll/run gll.clj: 449
instaparse.gll/run gll.clj: 463
instaparse.gll/step gll.clj: 445
instaparse.gll/push-message/f gll.clj: 272
instaparse.gll/NodeListener/fn gll.clj: 501
instaparse.gll/push-result gll.clj: 362
instaparse.gll/push-message gll.clj: 280
clojure.core/swap! core.clj: 2262
...
clojure.core/assoc core.clj: 191
...
java.lang.OutOfMemoryError: Java heap space
clojure.lang.ExceptionInfo: Java heap space
line: 47
The stack trace differs a little bit each time but overall the error remains the same. The machine I try to build on has plenty of RAM (16GB) and the machine never appears under any memory stress (and the OOM killer has never showed up).
I 've already tried setting BOOT_JVM_OPTIONS=-Xmx8g
(8 is the last value I tried before giving up and filing this task) but I saw no difference. I am a bit out of my depth here, my clojure knowledge is pretty limited, I 'd appreciate any help. Thanks!
I would check Docker memory limits as well, they might be lower than 8gb. docker stats
shows both the current memory usage and the limit.
I would check Docker memory limits as well, they might be lower than 8gb.
docker stats
shows both the current memory usage and the limit.
Indeed, they could be, but they are not in this case. In order to be set, I would need to pass -m
and I don't. Nor do I have any kind of configuration in /etc/docker/daemon.json that would trigger such a thing (in fact, no such setting exists). But most importantly:
- Trying to exceed the max limit (which defaults to memory) triggers the kernel OOM killer, which would leave a trace in logs/dmesg.
- Trying to go above the high limit (which by default is set to max, meaning the above actually happens), would heavily throttle and force very heavy memory reclaiming and would leave evidence in
memory.events
/sys/fs/cgroup corresponding file(s). For the entirety of the run the only thing I see is:
low 0
high 0
max 0
oom 0
oom_kill 0
- Memory usage actually never goes above 2.2GB in Docker stats.
I did already blame my machine and tried it in a cloud provider. Same behavior for what is worth.
I could try and do this outside of docker entirely to see if Docker is the issue here, but I kinda doubt it at this point.
As an update, I reproduced this outside of Docker too. Process was pretty simple.
- Get a new Debian VM (bullseye or bookworm, doesn't matter)
- Install openjdk-17-jdk, git and yarn from apt
- Install boot for their website following their instructions
- clone repo and prep the git submodule
- boot build
Get the same error and behavior.
I 've dug a bit more into what might be causing this. I 've tried randomly checking out various commits in this repo to see if they would build successfully, to no avail.
Then, I tried checking out various tags in the git submodule (the actual game that is). And well, v0.9.12
built just fine. What's utterly confusing to me is that a git bisect
later I ended up with the following (all of this is done on the master
branch of this repo).
git bisect start master v0.9.12
Bisecting: 1565 revisions left to test after this (roughly 11 steps)
[79ae2450290a9cb18293e340ef20817895846914] feat(content): The player is contacted by Republic Intel after making contact with the Hai (#7001)
[snip]
16e7a8060a9414a992def0c6c20958ef7ed91848 is the first bad commit
commit 16e7a8060a9414a992def0c6c20958ef7ed91848
Author: Zitchas <[email protected]>
Date: Fri Sep 27 07:00:33 2019 -0600
Adding the Quantum Tear
Which is this https://github.com/endless-sky/endless-sky/commit/16e7a8060a9414a992def0c6c20958ef7ed91848. First version to be included in is v0.9.13
, which is dated Apr 17, 2021
.
I don't find it very plausible that for the last almost 3 years this was failing to build, but maybe it's a useful clue.
It isn't possible to cleanly revert this commit as the game data/ folder has changed. It is still doable manually to remove the outfit from the data files and remove it from the Heron
ship. But it doesn't make a difference. I 've removed it from master
branch for the game git submodule but still get the same error.
A more complete stack trace than the ones posted above (I am a clojure/boot novice, bear with me) as found out via boot -vvv build
appears to be:
clojure.lang.ExceptionInfo: Java heap space {:line 47}
at clojure.core$ex_info.invokeStatic(core.clj:4617)
at clojure.core$ex_info.invoke(core.clj:4617)
at boot.main$_main$fn__1317.invoke(main.clj:222)
at boot.main$_main.invokeStatic(main.clj:216)
at boot.main$_main.invoke(main.clj:123)
at clojure.lang.Var.invoke(Var.java:394)
at org.projectodd.shimdandy.impl.ClojureRuntimeShimImpl.invoke(ClojureRuntimeShimImpl.java:154)
at org.projectodd.shimdandy.impl.ClojureRuntimeShimImpl.invoke(ClojureRuntimeShimImpl.java:145)
at boot.App.runBoot(App.java:407)
at boot.App.main(App.java:500)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:568)
at Boot.main(Boot.java:257)
Caused by: java.lang.OutOfMemoryError: Java heap space
at clojure.lang.PersistentHashMap.cloneAndSet(PersistentHashMap.java:1169)
at clojure.lang.PersistentHashMap.access$000(PersistentHashMap.java:28)
at clojure.lang.PersistentHashMap$ArrayNode.assoc(PersistentHashMap.java:418)
at clojure.lang.PersistentHashMap$ArrayNode.assoc(PersistentHashMap.java:415)
at clojure.lang.PersistentHashMap.assoc(PersistentHashMap.java:142)
at clojure.lang.PersistentHashMap.assoc(PersistentHashMap.java:28)
at clojure.lang.RT.assoc(RT.java:792)
at clojure.core$assoc__4371.invokeStatic(core.clj:191)
at clojure.core$assoc__4371.invoke(core.clj:190)
at clojure.lang.Atom.swap(Atom.java:65)
at clojure.core$swap_BANG_.invokeStatic(core.clj:2262)
at clojure.core$swap_BANG_.invoke(core.clj:2253)
at instaparse.gll$node_get.invokeStatic(gll.clj:320)
at instaparse.gll$node_get.invoke(gll.clj:312)
at instaparse.gll$push_listener.invokeStatic(gll.clj:374)
at instaparse.gll$push_listener.invoke(gll.clj:367)
at instaparse.gll$ordered_alt_parse.invokeStatic(gll.clj:867)
at instaparse.gll$ordered_alt_parse.invoke(gll.clj:860)
at instaparse.gll$_parse.invokeStatic(gll.clj:149)
at instaparse.gll$_parse.invoke(gll.clj:130)
at instaparse.gll$push_listener$fn__4146.invoke(gll.clj:383)
at instaparse.gll$step.invokeStatic(gll.clj:445)
at instaparse.gll$step.invoke(gll.clj:439)
at instaparse.gll$run.invokeStatic(gll.clj:463)
at instaparse.gll$run.invoke(gll.clj:447)
at instaparse.gll$run.invokeStatic(gll.clj:449)
at instaparse.gll$run.invoke(gll.clj:447)
at instaparse.gll$parse.invokeStatic(gll.clj:979)
at instaparse.gll$parse.invoke(gll.clj:974)
at instaparse.core$parse.invokeStatic(core.clj:94)
at instaparse.core$parse.doInvoke(core.clj:47)
at clojure.lang.RestFn.invoke(RestFn.java:425)
Which is this endless-sky/endless-sky@16e7a80. First version to be included in is
v0.9.13
, which is datedApr 17, 2021
.
The problem now reproduces on my machine, and since I've added some logging, I see that it gets stuck on the remnant outfits.txt
file which was changed in the commit you mentioned. From what I see here, there's nothing that should break it, so I may need some time to investigate.
Hi,
Thanks for following up.
I 've been doing some more debugging on this issue every now and then. I think it has multiple manifestations, but I also bear some good news. I found the source of one of the current manifestation you are witnessing.
First off, my git bisect above is misleading. I had a mistake in my git bisect script, which ended up identifying multiple issues, not just OutOfMemory ones. It did identify an actual issue, namely that "velocity scale" .2.5
in effect superweapon residue
had a typo (note the 2 dots) but that was fixed in endless-sky/endless-sky@2a9ab8e4d387a5. You can completely disregard the findings of my git bisect, I hope you didn't waste too much time on that rabbit hole (I know I did). However, let me point out that this offered me a hint for the current issue.
Tangentially, I test on Linux, my understanding per #20 is that least some tries have happened on Windows. Docker desktop on Windows (and MacOs) starts a Linux VM (there are some differences regarding Hyper-V vs WSL 2 apparently, but ) and runs Docker in it. There some more information at https://docs.docker.com/desktop/settings/windows/#resources, my understanding is that the default is 2GB, which most definitely isn't going to be enough (I 'll explain later why) and will need tweaking.
To the point, using your latest commits to master, I can reproduce the issue you see on my machine. RSS memory usage for the JVM is up to 8400MB RAM (virtual is at 10GB RAM, but virtual isn't really useful here) before croaking with
Execution error (OutOfMemoryError) at instaparse.auto_flatten_seq.AutoFlattenSeq/conj_flat (auto_flatten_seq.clj:145).
Java heap space
Full report at:
/tmp/clojure-8038666787650120666.edn
Error: building at STEP "RUN clojure -J-Xmx8g -X:clj:build": while running runtime: exit status 1
I assume this is the same error you see. I also assume that my earlier efforts to pass -Xmx8g
have been unsuccessful, for reasons I don't know but also are probably not important any more, since you ditched boot for tools.deps (thanks for that!)
Chasing down the remnant outfits.txt
issue, I was able to build fine v0.10.6
. From there one, there is only 1 commit in v0.10.8
that affects that file, namely endless-sky/endless-sky@a92acf949. While that is a big change, the file in question has few changes and there is a suspicious one at https://github.com/endless-sky/endless-sky/commit/a92acf949#diff-07da20eabfd0ddcd0a1f9e79d6f3ecf1d225279d058e1f721340afec9b05138dL408
Note how "hit-force"135
does not have the single space that "hit force" 90
has. And sure enough, locally messing with that single line and adding a single space character before 135, makes the build succeed.
So good news! There's a fix (and it should probably make it upstream anyway)! However, this shows a difference between the in-game parser (I am assuming it doesn't croak due to that space character missing) and this project's parser. I can try to submit a fix to parser.bnf
if you want, but do let me know if that would be the wrong way to solve it.
Finally, I left for later to explain why I think a default Docker Desktop on Windows will anyway fail. While debugging this, I 've crafted the following patch to parser.clj. https://gist.github.com/akosiaris/a643e4cdd389fbf0bb81468547ade07e
There is 1 thing that stood out.
parsing: map systems.txt. Size: 696566. Used memory before parse: 1472551.6
Used memory after parse: 3708792.2
Parsing the 681KB map systems.txt
file, moved the memory usage from ~1.4GB to > 3.6GB. I did notice this in RSS usage as well, so while my code is crude and quite possibly somewhat wrong (I am not sure if I should be accounting for JVM's heap, native, direct and stack memory types), it's not entirely off. Why the parser would require > 2GB RAM to parse a 681KB file is still a question whose answer eludes me, but it makes me think that a default Windows/MacOS Docker Desktop setup, just isn't gonna cut it.
The missing space was fixed in https://github.com/endless-sky/endless-sky/commit/d365d4ac91e62fe6d66d264be42d3f918a8d0887 fwiw
Hi.
Thank you for helping to debug the issue.
I also found the problem with that "hit-force"135
without a space inside remnant outfits.txt
. It was already fixed in the game repository but unfortunately the v0.10.8
version remains unparseable (and will always be). I'm not sure how to fix that in parser.bnf
because it heavily relies on the fact that tokens (things like "hit-force"
, 135
, licenses
etc - the smallest bits of data processed by the parser) are always separated by whitespace characters. If you know how to change the parser grammar (parser.bnf
) to support these cases, you're welcome to send a PR.
Speaking of the Windows problem: the conversation in #20 moved to the point where the parsing part began to work, and the current problem is to build the frontend app; so it seems the problem is solved by fixing the data files.
Overall it seems like the instaparse
library (which handles all the parsing in this project) may go into some infinite loop or recursion when something in the parsed files does not match the parser grammar; it consumes all the memory it can get and finally crashes the JVM with that Java heap space
exception. It's just my assumption, I don't really have any other ideas.
I decided to switch the Docker image to using the latest master
version of the game for now (until the next release comes out), so currently both Docker and manual installation ways should work (they will use the current version of the game). Can you please check if it works on your machine so we can close the issue?
Hi,
Can you please check if it works on your machine so we can close the issue?
It does. As far as this issue goes, I think we can close it. The situation has changed enough to warrant this. Boot is no longer in use, the clj aliases pass a sufficiently large Java heap space parameterization for the build to work and usage by others is reportedly unblocked (per #20)
In my humble opinion, there is also an underlying issue, that is that parsing some files requires Java Heap Space memory that is orders of magnitude larger than the files themselves, which might (or might not be, I just have a gut feeling that it might) be related to the issue you describe of unexpected (for the grammar) input send instaparse into a death spiral ending with Java Heap Space exception. If we ever try to tackle this, it should be in it's own dedicated issue.
My thanks for handling this!