ditto icon indicating copy to clipboard operation
ditto copied to clipboard

internal server error when fetching ditto connections via the api

Open JeffreyThijs opened this issue 1 year ago • 8 comments

We upgraded recently from ditto 3.3.7 to 3.4.1 and we encounter the following exception when fetching all full connections via the connection api:

curl http://ditto-nginx:8080/api/2/connections --user "devops:mypassword" -s | jq
{
  "status": 500,
  "error": "connectivity:internalerror",
  "message": "There was a rare case of an unexpected internal error.",
  "description": "Please contact the service team."
}

When i add the ids-only or fields=id query parameter it correctly returns a list of the ditto connections (with only the id of course), but any additional or other fields yield the same error message. In the logs of both ditto-gateway and ditto-connectivity is see nothing that indicates that something is going wrong. Maybe https://github.com/eclipse-ditto/ditto/commit/bedaaee80d093b1b577a84bce4511ce9bb096a17 introduced this problem?

If you need more info, please let me know.

JeffreyThijs avatar Nov 29 '23 12:11 JeffreyThijs

Hi @JeffreyThijs

Some logs would be nice, e.g. of gateway and connectivity service. Maybe there event is a stacktrace?

thjaeckle avatar Nov 29 '23 19:11 thjaeckle

Found following stacktrace in the gateway:

kb logs -l "app.kubernetes.io/name=ditto-gateway" -f | grep -i 'ConnectionNotAccessibleException' | jq                                                                                                                                       (dev/aloxy)
{
  "@timestamp": "2023-11-30T08:49:19.926+01:00",
  "@version": "1",
  "message": "Received response <{}> from connectivity.service.",
  "logger_name": "org.eclipse.ditto.gateway.service.endpoints.actors.DittoConnectionsRetrievalActor",
  "thread_name": "Thread-52011",
  "level": "DEBUG",
  "level_value": 10000,
  "stack_trace": "org.eclipse.ditto.connectivity.model.signals.commands.exceptions.ConnectionNotAccessibleException: The Connection with ID '327bfa0b-1984-4c50-8700-9adb1579e6e1' could not be found or requester had insufficient permissions to access it.\n\tat org.eclipse.ditto.connectivity.model.signals.commands.exceptions.ConnectionNotAccessibleException$Builder.doBuild(ConnectionNotAccessibleException.java:119)\n\tat org.eclipse.ditto.connectivity.model.signals.commands.exceptions.ConnectionNotAccessibleException$Builder.doBuild(ConnectionNotAccessibleException.java:101)\n\tat org.eclipse.ditto.base.model.exceptions.DittoRuntimeExceptionBuilder.build(DittoRuntimeExceptionBuilder.java:176)\n\tat org.eclipse.ditto.base.model.exceptions.DittoRuntimeException.fromJson(DittoRuntimeException.java:245)\n\tat org.eclipse.ditto.connectivity.model.signals.commands.exceptions.ConnectionNotAccessibleException.fromJson(ConnectionNotAccessibleException.java:84)\n\tat jdk.internal.reflect.GeneratedMethodAccessor14.invoke(Unknown Source)\n\tat java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat java.base/java.lang.reflect.Method.invoke(Method.java:568)\n\tat org.eclipse.ditto.base.model.signals.AnnotationBasedJsonParsable.parse(AnnotationBasedJsonParsable.java:94)\n\tat org.eclipse.ditto.base.model.signals.AbstractJsonParsableRegistry.fromJson(AbstractJsonParsableRegistry.java:97)\n\tat org.eclipse.ditto.base.model.signals.AbstractJsonParsableRegistry.lambda$parse$0(AbstractJsonParsableRegistry.java:79)\n\tat org.eclipse.ditto.base.model.exceptions.DittoJsonException.wrapJsonRuntimeException(DittoJsonException.java:161)\n\tat org.eclipse.ditto.base.model.signals.AbstractJsonParsableRegistry.parse(AbstractJsonParsableRegistry.java:78)\n\tat org.eclipse.ditto.base.model.signals.GlobalErrorRegistry.parse(GlobalErrorRegistry.java:91)\n\tat org.eclipse.ditto.base.model.signals.GlobalErrorRegistry.parse(GlobalErrorRegistry.java:36)\n\tat org.eclipse.ditto.internal.utils.cluster.AbstractJsonifiableWithDittoHeadersSerializer.deserializeJson(AbstractJsonifiableWithDittoHeadersSerializer.java:423)\n\tat org.eclipse.ditto.internal.utils.cluster.AbstractJsonifiableWithDittoHeadersSerializer.createJsonifiableFrom(AbstractJsonifiableWithDittoHeadersSerializer.java:325)\n\tat org.eclipse.ditto.internal.utils.cluster.AbstractJsonifiableWithDittoHeadersSerializer.tryToCreateKnownJsonifiableFrom(AbstractJsonifiableWithDittoHeadersSerializer.java:296)\n\tat org.eclipse.ditto.internal.utils.cluster.AbstractJsonifiableWithDittoHeadersSerializer.fromBinary(AbstractJsonifiableWithDittoHeadersSerializer.java:273)\n\tat org.apache.pekko.serialization.Serialization.deserializeByteBuffer(Serialization.scala:279)\n\tat org.apache.pekko.remote.MessageSerializer$.deserializeForArtery(MessageSerializer.scala:111)\n\tat org.apache.pekko.remote.artery.Deserializer$$anon$3.onPush(Codecs.scala:683)\n\tat org.apache.pekko.stream.impl.fusing.GraphInterpreter.processPush(GraphInterpreter.scala:555)\n\tat org.apache.pekko.stream.impl.fusing.GraphInterpreter.processEvent(GraphInterpreter.scala:506)\n\tat org.apache.pekko.stream.impl.fusing.GraphInterpreter.execute(GraphInterpreter.scala:400)\n\tat org.apache.pekko.stream.impl.fusing.GraphInterpreterShell.runBatch(ActorGraphInterpreter.scala:662)\n\tat org.apache.pekko.stream.impl.fusing.GraphInterpreterShell$AsyncInput.execute(ActorGraphInterpreter.scala:532)\n\tat org.apache.pekko.stream.impl.fusing.GraphInterpreterShell.processEvent(ActorGraphInterpreter.scala:637)\n\tat org.apache.pekko.stream.impl.fusing.ActorGraphInterpreter.org$apache$pekko$stream$impl$fusing$ActorGraphInterpreter$$processEvent(ActorGraphInterpreter.scala:813)\n\tat org.apache.pekko.stream.impl.fusing.ActorGraphInterpreter$$anonfun$receive$1.applyOrElse(ActorGraphInterpreter.scala:831)\n\tat org.apache.pekko.actor.Actor.aroundReceive(Actor.scala:547)\n\tat org.apache.pekko.actor.Actor.aroundReceive$(Actor.scala:545)\n\tat org.apache.pekko.stream.impl.fusing.ActorGraphInterpreter.aroundReceive(ActorGraphInterpreter.scala:729)\n\tat org.apache.pekko.actor.ActorCell.receiveMessage(ActorCell.scala:590)\n\tat org.apache.pekko.actor.ActorCell.invoke(ActorCell.scala:557)\n\tat org.apache.pekko.dispatch.Mailbox.processMailbox(Mailbox.scala:280)\n\tat org.apache.pekko.dispatch.Mailbox.run(Mailbox.scala:241)\n\tat org.apache.pekko.dispatch.Mailbox.exec(Mailbox.scala:253)\n\tat java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:373)\n\tat java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1182)\n\tat java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1655)\n\tat java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1622)\n\tat java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:165)\n",
  "correlation-id": "d24b587e-8ec1-4581-8d17-38298b1def20"
}

JeffreyThijs avatar Nov 30 '23 07:11 JeffreyThijs

I fear with that little information I cannot be of any help.

thjaeckle avatar Nov 30 '23 12:11 thjaeckle

So today I was trying to figure out what's going on here since not being able to fetch the existing connections is quite blocking to us. As @JeffreyThijs stated, we were able to fetch the ids only. When I did that, I got back a list of approx. 20 connections being active in ditto. I manually tried fetching them because I wanted to check the contents of the connection. I stumbled upon one with id 327bfa0b-1984-4c50-8700-9adb1579e6e1 (the ID doesn't mind but if you look closely, it's the one that is mentioned in the log line Jeffrey posted before. I went looking for it in mongoDB and saw 46 records in the connection_journal table with pid connection:327bfa0b-1984-4c50-8700-9adb1579e6e1. it showed following records:

  {
    _id: ObjectId("646f79710d77134491007ae2"),
    pid: 'connection:327bfa0b-1984-4c50-8700-9adb1579e6e1',
    from: Long("60"),
    to: Long("60"),
    events: [
      {
        v: 1,
        pid: 'connection:327bfa0b-1984-4c50-8700-9adb1579e6e1',
        sn: Long("60"),
        _t: 'bson',
        p: {
          type: 'persistence-actor-internal:empty-event',
          effect: 'alwaysAlive',
          __hh: {}
        },
        _tg: [ 'always-alive', 'priority-0' ],
        manifest: 'persistence-actor-internal:empty-event',
        _w: '9393e517-419d-4ade-9338-500780c519eb'
      }
    ],
    v: 1,
    _tg: [ 'always-alive', 'priority-0' ]
  }

Since I don't know what this is for, I manually removed all of these records out of the collection and restarted the ditto-connectivity pods. After that, we were able to fetch the ditto connections again using the HTTP GET endpoint. Something to note: the connection count in ditto dropped from 20 to 7. Any idea what might be going on? We experienced this in our development environment now and would like to avoid this in our production environment. It's worthy to mention that we recently upgraded ditto from 3.3.7 to 3.4.3

BobClaerhout avatar Dec 07 '23 12:12 BobClaerhout

This record looks very strange, like it was exported with mongoExport end then restored in the database which would cause the numeric values too look like the "from:, to:, sn:". If this is what you did you should use mongoDump as it uses binary and preservs types not like mongoExport that works with json.

alstanchev avatar Dec 07 '23 15:12 alstanchev

I agree it looks strange. I double checked with my colleagues and we did not do any manual actions on the database. The origin of these are therefor unknown. On the other hand: it's not good this makes the "list connections" call fail, is it? Wouldn't it be better to just return the connections it is able to load and skip the ones which fail instead of failing the call?

BobClaerhout avatar Dec 11 '23 08:12 BobClaerhout

I would prefer to make the error more vissible with a more descriptive message (something like "Corrupted record or unable to deserialize") instead of returning what is possible only. Otherwise such corruptions will remain hidden. But still i doubt this was persisted by the service. Is it possible that by mistake maybe while examining the db with Mongo Compass or the atlass UI an unoticed edit was done?

alstanchev avatar Dec 12 '23 10:12 alstanchev

I doubt the edit. It's not that easy to change documents in the atlass UI, let alone multiple accidental edits. Providing a more descriptive error is also fine!

BobClaerhout avatar Jan 29 '24 08:01 BobClaerhout