convergence-project
convergence-project copied to clipboard
Occasional "Connection reset" and 500 Internal Server Error
Hi!
We occasionally have "Connection reset" errors in Convergence server logs and 500 Internal Server Error in response to a REST API call.
GET domains/convergence/default/models/ffreport-3605: 500 Server Error
b'{"body":{"error_code":"internal_server_error"},"ok":false}'
Convergence server logs:
INFO: Caught Network I/O errors on odb.pl-grt-orientdb:2424/7048553260737367023, trying an automatic reconnection... (error: Connection reset)
Caused by: java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(SocketInputStream.java:186) ~[?:?]
at java.net.SocketInputStream.read(SocketInputStream.java:140) ~[?:?]
at java.io.BufferedInputStream.fill(BufferedInputStream.java:252) ~[?:?]
at java.io.BufferedInputStream.read(BufferedInputStream.java:271) ~[?:?]
at java.io.DataInputStream.readByte(DataInputStream.java:270) ~[?:?]
at com.orientechnologies.orient.enterprise.channel.binary.OChannelBinary.readByte(OChannelBinary.java:75) ~[com.orientechnologies.orientdb-core-3.0.32.jar:3.0.32]
at com.orientechnologies.orient.client.binary.OChannelBinaryAsynchClient.beginResponse(OChannelBinaryAsynchClient.java:176) ~[com.orientechnologies.orientdb-client-3.0.32.jar:3.0.32]
at com.orientechnologies.orient.client.binary.OChannelBinaryAsynchClient.beginResponse(OChannelBinaryAsynchClient.java:160) ~[com.orientechnologies.orientdb-client-3.0.32.jar:3.0.32]
at com.orientechnologies.orient.client.remote.OStorageRemote.beginResponse(OStorageRemote.java:1891) ~[com.orientechnologies.orientdb-client-3.0.32.jar:3.0.32]
at com.orientechnologies.orient.client.remote.OStorageRemote.lambda$networkOperationRetryTimeout$2(OStorageRemote.java:248) ~[com.orientechnologies.orientdb-client-3.0.32.jar:3.0.32]
at com.orientechnologies.orient.client.remote.OStorageRemote.baseNetworkOperation(OStorageRemote.java:307) ~[com.orientechnologies.orientdb-client-3.0.32.jar:3.0.32]
... 42 more
16:34:35 ERROR RealtimeModelActor - Unexpected error getting model
com.orientechnologies.common.io.OIOException: Connection reset
at com.orientechnologies.orient.client.remote.OStorageRemote.baseNetworkOperation(OStorageRemote.java:351) ~[com.orientechnologies.orientdb-client-3.0.32.jar:3.0.32]
at com.orientechnologies.orient.client.remote.OStorageRemote.networkOperationRetryTimeout(OStorageRemote.java:228) ~[com.orientechnologies.orientdb-client-3.0.32.jar:3.0.32]
at com.orientechnologies.orient.client.remote.OStorageRemote.networkOperationNoRetry(OStorageRemote.java:261) ~[com.orientechnologies.orientdb-client-3.0.32.jar:3.0.32]
at com.orientechnologies.orient.client.remote.OStorageRemote.command(OStorageRemote.java:947) ~[com.orientechnologies.orientdb-client-3.0.32.jar:3.0.32]
at com.orientechnologies.orient.core.db.document.ODatabaseDocumentRemote.indexQuery(ODatabaseDocumentRemote.java:397) ~[com.orientechnologies.orientdb-client-3.0.32.jar:3.0.32]
at com.orientechnologies.orient.core.index.OIndexRemoteOneValue.get(OIndexRemoteOneValue.java:48) ~[com.orientechnologies.orientdb-core-3.0.32.jar:3.0.32]
at com.orientechnologies.orient.core.index.OIndexRemoteOneValue.get(OIndexRemoteOneValue.java:39) ~[com.orientechnologies.orientdb-core-3.0.32.jar:3.0.32]
at com.orientechnologies.orient.core.index.OIndexAbstractDelegate.get(OIndexAbstractDelegate.java:59) ~[com.orientechnologies.orientdb-core-3.0.32.jar:3.0.32]
at com.orientechnologies.orient.core.index.OIndexTxAwareOneValue.get(OIndexTxAwareOneValue.java:231) ~[com.orientechnologies.orientdb-core-3.0.32.jar:3.0.32]
at com.orientechnologies.orient.core.index.OIndexTxAwareOneValue.get(OIndexTxAwareOneValue.java:38) ~[com.orientechnologies.orientdb-core-3.0.32.jar:3.0.32]
at com.convergencelabs.convergence.server.backend.datastore.OrientDBUtil$.$anonfun$findDocumentFromSingleValueIndex$1(OrientDBUtil.scala:189) ~[com.convergencelabs.convergence-server-1.0.0-rc.7.jar:1.0.0-rc.7]
at scala.util.Try$.apply(Try.scala:210) ~[org.scala-lang.scala-library-2.13.2.jar:?]
at com.convergencelabs.convergence.server.backend.datastore.OrientDBUtil$.findDocumentFromSingleValueIndex(OrientDBUtil.scala:185) ~[com.convergencelabs.convergence-server-1.0.0-rc.7.jar:1.0.0-rc.7]
at com.convergencelabs.convergence.server.backend.datastore.domain.model.ModelStore$.com$convergencelabs$convergence$server$backend$datastore$domain$model$ModelStore$$findModelDocument(ModelStore.scala:333) ~[com.convergencelabs.convergence-server-1.0.0-rc.7.jar:1.0.0-rc.7]
at com.convergencelabs.convergence.server.backend.datastore.domain.model.ModelStore.$anonfun$getModel$1(ModelStore.scala:198) ~[com.convergencelabs.convergence-server-1.0.0-rc.7.jar:1.0.0-rc.7]
at com.convergencelabs.convergence.server.backend.db.PooledDatabaseProvider.$anonfun$withDatabase$3(PooledDatabaseProvider.scala:58) ~[com.convergencelabs.convergence-server-1.0.0-rc.7.jar:1.0.0-rc.7]
at com.convergencelabs.convergence.server.util.TryWithResource.tryWithResolvedResource(TryWithResource.scala:50) ~[com.convergencelabs.convergence-server-1.0.0-rc.7.jar:1.0.0-rc.7]
at com.convergencelabs.convergence.server.util.TryWithResource.com$convergencelabs$convergence$server$util$TryWithResource$$tryWithResource(TryWithResource.scala:36) ~[com.convergencelabs.convergence-server-1.0.0-rc.7.jar:1.0.0-rc.7]
at com.convergencelabs.convergence.server.util.TryWithResource$.apply(TryWithResource.scala:78) ~[com.convergencelabs.convergence-server-1.0.0-rc.7.jar:1.0.0-rc.7]
at com.convergencelabs.convergence.server.backend.db.PooledDatabaseProvider.$anonfun$withDatabase$1(PooledDatabaseProvider.scala:56) ~[com.convergencelabs.convergence-server-1.0.0-rc.7.jar:1.0.0-rc.7]
at scala.util.Success.flatMap(Try.scala:258) ~[org.scala-lang.scala-library-2.13.2.jar:?]
at com.convergencelabs.convergence.server.backend.db.PooledDatabaseProvider.withDatabase(PooledDatabaseProvider.scala:55) ~[com.convergencelabs.convergence-server-1.0.0-rc.7.jar:1.0.0-rc.7]
at com.convergencelabs.convergence.server.backend.datastore.AbstractDatabasePersistence.withDb(AbstractDatabasePersistence.scala:32) ~[com.convergencelabs.convergence-server-1.0.0-rc.7.jar:1.0.0-rc.7]
at com.convergencelabs.convergence.server.backend.datastore.domain.model.ModelStore.getModel(ModelStore.scala:196) ~[com.convergencelabs.convergence-server-1.0.0-rc.7.jar:1.0.0-rc.7]
at com.convergencelabs.convergence.server.backend.services.domain.model.RealtimeModelActor.$anonfun$retrieveModel$3(RealtimeModelActor.scala:325) ~[com.convergencelabs.convergence-server-1.0.0-rc.7.jar:1.0.0-rc.7]
at com.convergencelabs.convergence.server.backend.services.domain.model.RealtimeModelActor.$anonfun$retrieveModel$3$adapted(RealtimeModelActor.scala:323) ~[com.convergencelabs.convergence-server-1.0.0-rc.7.jar:1.0.0-rc.7]
at scala.util.Success.flatMap(Try.scala:258) ~[org.scala-lang.scala-library-2.13.2.jar:?]
at com.convergencelabs.convergence.server.backend.services.domain.model.RealtimeModelActor.$anonfun$retrieveModel$1(RealtimeModelActor.scala:323) ~[com.convergencelabs.convergence-server-1.0.0-rc.7.jar:1.0.0-rc.7]
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18) ~[org.scala-lang.scala-library-2.13.2.jar:?]
at scala.concurrent.Future$.$anonfun$apply$1(Future.scala:671) ~[org.scala-lang.scala-library-2.13.2.jar:?]
at scala.concurrent.impl.Promise$Transformation.run(Promise.scala:430) ~[org.scala-lang.scala-library-2.13.2.jar:?]
at akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:56) ~[com.typesafe.akka.akka-actor_2.13-2.6.6.jar:2.6.6]
at akka.dispatch.BatchingExecutor$BlockableBatch.$anonfun$run$1(BatchingExecutor.scala:93) ~[com.typesafe.akka.akka-actor_2.13-2.6.6.jar:2.6.6]
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18) [org.scala-lang.scala-library-2.13.2.jar:?]
at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:94) [org.scala-lang.scala-library-2.13.2.jar:?]
at akka.dispatch.BatchingExecutor$BlockableBatch.run(BatchingExecutor.scala:93) [com.typesafe.akka.akka-actor_2.13-2.6.6.jar:2.6.6]
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:48) [com.typesafe.akka.akka-actor_2.13-2.6.6.jar:2.6.6]
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:48) [com.typesafe.akka.akka-actor_2.13-2.6.6.jar:2.6.6]
at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290) [?:?]
at java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020) [?:?]
at java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656) [?:?]
at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594) [?:?]
at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:177) [?:?]
More details:
- We observe no other connectivity issues with Convergence server or OrientDB.
- There are no errors or warnings in OrientDB logs.
- We have everything deployed to a Kubernetes cluster. Our installation has 1 instance of convergence-seed, 3 instances of convergence-server, and 1 instance of OrientDB.
Maybe you could give us some hints for what might go wrong and tips on what Convergence server settings to tweak. Since GET should be a safe method, should we implement retries on the side of our application that makes REST API calls?
Versions
- Convergence Version:
convergencelabs/convergence-server:1.0.0-rc.7 - OS: Running in Docker
Describe the Bug
REST API client sometimes gets 500 Internal Server Error from Convergence.
Expected Behavior
If Convergence server has some kind of an OrientDB connection pool which might contain stale connections, it should reconnect when hitting connection reset.
Alternatively, if retries are considered a caller's duty, Convergence server should probably return 503 instead of 500 to signify that the client should make a retry.
Thanks for reporting. This does seem like a bug. I do believe we are using the built in OrientDB connection pooling capability. In general, we would expect that the connection pool would be keeping the connections alive or fresh.
Finally getting to look at this error. I did confirm that the error is inside OrientDB client code. Here specifically:
https://github.com/orientechnologies/orientdb/blob/3c1a6e6a9a18ecc0312058ac9f2ba86cd2365594/client/src/main/java/com/orientechnologies/orient/client/remote/OStorageRemote.java#L348
We are using the OrientDB connection pooling, so this seems like there is an error in the connection pool logic on the OrientDB side. We are in the process of upgrading from 3.0.x to 3.2.x of OrientDB so I am hoping that this will resolve the situation.
I do agree on the 503 vs 500 error.
As a matter of fact, after we reduced the number of convergence-server instances we run in the Akka cluster from 3 to 1 we basically stopped receiving these "Connection reset" errors. They only happen occasionally (a couple of times in a month) and no longer bother us.
But as you were going to upgrade OrientDB client libraries to 3.2.x (and it seems it hasn't been done yet in 1.0.0-rc.12), I think we should keep this issue open.