jsonschema2pojo
jsonschema2pojo copied to clipboard
java.lang.OutOfMemoryError: Java heap space
file: https://developer.walmart.com/image/asdp/us/mp/fulfillment/WFS_Convert_Schema_v4.5.json
cmd:java -jar -Xms8m -Xmx8G -XX:PermSize=8M -XX:MaxPermSize=8G "%~dp0/../lib/jsonschema2pojo-cli-1.1.2.jar" %*
Enum name ColorCategory already used; trying to replace it with ColorCategory________
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at com.fasterxml.jackson.databind.node.ObjectNode.
This is a very interesting use-case. At 373Kb this is probably the largest schema I have ever seen. I don't see any reason this should exhaust an 8Gb heap though, so there is likely some problem with us creating and storing an excessive number of objects somewhere.
It would be good to reduce the heap size (say 1GB) and create and create a heap dump. You can do this by setting:
-Xmx1g -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp
I think the problem is that for each object defined (in the json schema) SchemaStore caches the whole json schema definition (as baseSchema) as new instance which significantly increases the heap utilization. The ContentResolver each time it creates new instance of JSONNode object ( even though same URI is passed multiple times). Introducing a cache in the ContentResolver result in reusing the instance in the SchemaStore cache and therefore reduce the heap utilization. I think it should not have side-effects I think but I am not the expert here.
Running on Java 17: -Xmx
had to be set to ~ 426m to reproduce.
Heap class histogram:
Class Name | Objects | Shallow Heap | Retained Heap
-------------------------------------------------------------------------------------------------------------------------------------
java.util.LinkedHashMap$Entry | 3,100,216 | 124,008,640 | >= 438,750,232
byte[] | 2,331,985 | 81,777,168 | >= 81,777,168
java.util.HashMap$Node[] | 707,996 | 57,110,816 | >= 437,324,728
java.lang.String | 2,331,486 | 55,955,664 | >= 137,231,568
java.util.LinkedHashMap | 707,815 | 39,637,640 | >= 439,328,256
com.fasterxml.jackson.databind.node.TextNode | 2,309,079 | 36,945,264 | >= 172,478,832
com.fasterxml.jackson.databind.node.ObjectNode | 705,071 | 16,921,704 | >= 438,585,072
java.lang.Object[] | 220,387 | 13,968,536 | >= 114,421,624
java.util.ArrayList | 219,781 | 5,274,744 | >= 117,785,896
com.fasterxml.jackson.databind.node.IntNode | 321,997 | 5,151,952 | >= 5,152,032
com.fasterxml.jackson.databind.node.ArrayNode | 205,115 | 4,922,760 | >= 118,099,168
java.math.BigDecimal | 16,631 | 665,240 | >= 671,552
com.fasterxml.jackson.databind.node.DecimalNode | 16,595 | 265,520 | >= 929,464
java.util.HashMap | 5,479 | 262,992 | >= 436,983,552
com.sun.codemodel.JInvocation | 5,801 | 232,040 | >= 853,640
java.util.HashMap$Node | 5,308 | 169,856 | >= 436,682,056
com.sun.codemodel.JOp$BinaryOp | 6,251 | 150,024 | >= 490,408
java.util.concurrent.ConcurrentHashMap$Node | 4,659 | 149,088 | >= 328,456
com.sun.codemodel.JFieldRef | 4,186 | 133,952 | >= 134,056
com.sun.codemodel.JMethod | 1,993 | 127,552 | >= 2,704,416
java.net.URI | 1,573 | 125,840 | >= 677,064
int[] | 765 | 103,744 | >= 103,744
com.sun.codemodel.JDocComment | 1,728 | 82,944 | >= 547,280
sun.util.locale.LocaleObjectCache$CacheEntry | 1,986 | 79,440 | >= 79,440
com.sun.codemodel.JStringLiteral | 4,792 | 76,672 | >= 76,672
com.sun.codemodel.JMods | 4,501 | 72,016 | >= 72,280
com.sun.codemodel.JAnnotationUse | 2,939 | 70,536 | >= 581,432
com.sun.codemodel.JBlock | 2,809 | 67,416 | >= 1,560,504
char[] | 248 | 66,408 | >= 66,408
com.sun.codemodel.JAtom | 3,321 | 53,136 | >= 212,304
org.jsonschema2pojo.Schema | 1,508 | 48,256 | >= 438,301,584
...
org.jsonschema2pojo.SchemaStore | 1 | 24 | >= 436,404,448
org.jsonschema2pojo.SchemaMapper | 1 | 24 | >= 4,448
...
It confirms @krystianekb hypothesis:
SchemaStore caches the whole json schema definition (as baseSchema) as new instance which significantly increases the heap utilization
However rather than introducing cache in the content resolver it should be possible to check whether schemas
contains resolved baseId
(URI without fragment) and if it does - take references to baseContent
and baseSchema
from cached schema instead of attempting to resolve them eg. replace:
https://github.com/joelittlejohn/jsonschema2pojo/blob/9315b7b69417899f0addb1d795938ac802763566/jsonschema2pojo-core/src/main/java/org/jsonschema2pojo/SchemaStore.java#L60-L65
with
URI baseId = removeFragment(id).normalize();
final JsonNode baseContent;
final Schema baseSchema;
if (schemas.containsKey(baseId)) {
baseContent = schemas.get(baseId).getContent();
baseSchema = schemas.get(baseId).getParent();
} else {
baseContent = contentResolver.resolve(baseId);
baseSchema = new Schema(baseId, baseContent, null);
}
if (normalizedId.toString().contains("#")) {
With solution above output could be generated using -Xmx9m
(47x less memory) without getting OOM.