convex icon indicating copy to clipboard operation
convex copied to clipboard

Calling persistPeerData in Server close, causes out of memory error

Open billbsing opened this issue 4 years ago • 6 comments

On slower peers, calling the Server methodpersistPeerData during a shutdown, causes errors since the storage close method is also being called by the shutdown hook. This leaves the etch database unreadable.

^CException in thread "Thread-1" java.lang.OutOfMemoryError: Java heap space at java.base/java.util.jar.Manifest$FastInputStream.(Manifest.java:362) at java.base/java.util.jar.Manifest$FastInputStream.(Manifest.java:357) at java.base/java.util.jar.Manifest.read(Manifest.java:228) at java.base/java.util.jar.Manifest.(Manifest.java:82) at java.base/java.util.jar.Manifest.(Manifest.java:74) at java.base/java.util.jar.JarFile.getManifestFromReference(JarFile.java:427) at java.base/java.util.jar.JarFile.getManifest(JarFile.java:408) at java.base/jdk.internal.loader.URLClassPath$JarLoader$2.getManifest(URLClassPath.java:871) at java.base/jdk.internal.loader.BuiltinClassLoader.defineClass(BuiltinClassLoader.java:786) at java.base/jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:698) at java.base/jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:621) at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:579) at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178) at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522) at ch.qos.logback.classic.Logger.buildLoggingEventAndAppend(Logger.java:419) at ch.qos.logback.classic.Logger.filterAndLog_0_Or3Plus(Logger.java:383) at ch.qos.logback.classic.Logger.warn(Logger.java:692) at convex.peer.Server.persistPeerData(Server.java:1122) at convex.peer.Server.close(Server.java:1132) at convex.peer.Server$2.run(Server.java:411) at convex.core.util.Shutdown$Group.lambda$runHooks$0(Shutdown.java:43) at convex.core.util.Shutdown$Group$$Lambda$79/0x00000008401ee840.accept(Unknown Source) at java.base/java.util.IdentityHashMap$KeySpliterator.forEachRemaining(IdentityHashMap.java:1443) at java.base/java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:658) at convex.core.util.Shutdown$Group.runHooks(Shutdown.java:42) at convex.core.util.Shutdown.runHooks(Shutdown.java:73) at convex.core.util.Shutdown$1.run(Shutdown.java:25) at java.base/java.lang.Thread.run(Thread.java:829) Exception in thread "Connection Manager thread at 8181" java.lang.OutOfMemoryError: Java heap space

billbsing avatar Sep 20 '21 08:09 billbsing

It is a bit strange that the error occurs in the logging code. I will adjust the logging to see if we can get better messages.

mikera avatar Sep 20 '21 08:09 mikera

the new log is a follows:

^C09:10:58.585 [Thread-1] WARN convex.peer.Server - Failed to persist peer state when closing server: Java heap space
java.lang.OutOfMemoryError: Java heap space
        at convex.core.data.ACell.createEncoding(ACell.java:172)
        at convex.core.data.ACell.getEncoding(ACell.java:119)
        at convex.core.data.ACell.getHash(ACell.java:69)
        at convex.core.data.RefDirect.getHash(RefDirect.java:76)
        at convex.core.data.Ref.encode(Ref.java:591)
        at convex.core.data.VectorTree.encodeRaw(VectorTree.java:168)
        at convex.core.data.VectorTree.encode(VectorTree.java:159)
        at convex.core.data.ACell.createEncoding(ACell.java:177)
        at convex.core.data.ACell.getEncoding(ACell.java:119)
        at convex.core.data.ACell.getHash(ACell.java:69)
        at convex.core.data.RefDirect.getHash(RefDirect.java:76)
        at convex.core.data.Ref.encode(Ref.java:591)
        at convex.core.data.VectorTree.encodeRaw(VectorTree.java:168)
        at convex.core.data.VectorTree.encode(VectorTree.java:159)
        at convex.core.data.ACell.createEncoding(ACell.java:177)
        at convex.core.data.ACell.getEncoding(ACell.java:119)
        at convex.core.data.ACell.getHash(ACell.java:69)
        at convex.core.data.RefDirect.getHash(RefDirect.java:76)
        at convex.core.data.Ref.encode(Ref.java:591)
        at convex.core.data.VectorTree.encodeRaw(VectorTree.java:168)
        at convex.core.data.VectorTree.encode(VectorTree.java:159)
        at convex.core.data.Ref.encode(Ref.java:588)
        at convex.core.data.VectorLeaf.encodeRaw(VectorLeaf.java:287)
        at convex.core.data.VectorLeaf.encode(VectorLeaf.java:271)
        at convex.core.State.encodeRaw(State.java:185)
        at convex.core.State.encode(State.java:180)
        at convex.core.BlockResult.encodeRaw(BlockResult.java:160)
        at convex.core.BlockResult.encode(BlockResult.java:155)
        at convex.core.data.ACell.createEncoding(ACell.java:177)
        at convex.core.data.ACell.getEncoding(ACell.java:119)
        at convex.core.data.ACell.getEncodingLength(ACell.java:246)
        at convex.core.data.ACell.isEmbedded(ACell.java:280)
Exception in thread "Update Loop on port: 8181" java.lang.OutOfMemoryError: Java heap space
09:11:07.228 [NIO Server selector loop on port: 8181] INFO convex.net.NIOServer - Selector loop ended on port: 0

billbsing avatar Sep 20 '21 09:09 billbsing

Interesting. I shall investigate!

mikera avatar Sep 20 '21 13:09 mikera

How does the data length cause this problem?

mikera avatar Sep 23 '21 10:09 mikera

How does the data length cause this problem?

when the peer crashes, the etch.close is not called. So the dataLength value stored in the etch file header is incorrect, so the etch db is corrupted on the next open

billbsing avatar Sep 23 '21 10:09 billbsing

Sounds like we have two underlying problems:

  • Etch.close() not getting called when it should be
  • Something causing too much memory usage when encoding

Would be good to focus in on those, independent of the Etch corruption problem. I will look at this more.

mikera avatar Sep 23 '21 10:09 mikera

Think this is fixed for now

mikera avatar Oct 04 '22 11:10 mikera