Conseil icon indicating copy to clipboard operation
Conseil copied to clipboard

Conseil API: sudden "request timeout"

Open ghost opened this issue 4 years ago • 25 comments

As advised in #932 , I built Conseil from master ([commit-hash: 9f4eb4c]). Since this change I see configuration-related warnings like:

06:58:42.422 [akka.actor.default-dispatcher-48]  WARN  t.c.c.a.metadata.UnitTransformation  - There're missing metadata overrides for "tezos / carthagenet / blocks / fork_id"
06:58:42.425 [akka.actor.default-dispatcher-48]  WARN  t.c.c.a.metadata.UnitTransformation  - There're missing metadata overrides for "tezos / carthagenet / blocks / invalidated_asof"

And also sudden "blackouts" where the API stops answering, i.e. no response, no 503, nothing to a GET /v2/data/tezos/carthagenet/blocks/head call.

Logs around the blackout:

...
13:09:54.004 [akka.actor.default-dispatcher-44]  INFO  t.c.c.a.d.RecordingDirectives        - HTTP request
13:09:54.178 [akka.actor.default-dispatcher-44]  INFO  t.c.c.a.d.RecordingDirectives        - HTTP request
13:09:54.179 [akka.actor.default-dispatcher-23]  INFO  t.c.c.a.d.RecordingDirectives        - HTTP request
13:09:54.185 [akka.actor.default-dispatcher-34]  INFO  t.c.c.a.d.RecordingDirectives        - HTTP request
13:10:00.802 [akka.actor.default-dispatcher-25]  WARN  t.c.c.a.metadata.UnitTransformation  - There're missing metadata overrides for "tezos / carthagenet / operations / invalidated_asof"
13:10:00.802 [akka.actor.default-dispatcher-25]  WARN  t.c.c.a.metadata.UnitTransformation  - There're missing metadata overrides for "tezos / carthagenet / operations / fork_id"
13:10:01.125 [akka.actor.default-dispatcher-43]  INFO  t.c.c.a.d.RecordingDirectives        - HTTP request
[INFO] [10/23/2020 13:15:03.812] [conseil-system-akka.actor.default-dispatcher-7418] [akka.actor.ActorSystemImpl(conseil-system)] Request timeout encountered for request [POST /v2/data/tezos/carthagenet/operations Strict(370 bytes)]
[INFO] [10/23/2020 13:15:04.063] [conseil-system-akka.actor.default-dispatcher-7418] [akka.actor.ActorSystemImpl(conseil-system)] Request timeout encountered for request [GET /v2/data/tezos/carthagenet/blocks/head Strict(0 bytes)]
[INFO] [10/23/2020 13:15:04.243] [conseil-system-akka.actor.default-dispatcher-7418] [akka.actor.ActorSystemImpl(conseil-system)] Request timeout encountered for request [GET /v2/data/tezos/carthagenet/blocks/head Strict(0 bytes)]

...

It has happened several times since the update and the pattern is always the same:

HTTP request                            # Healthy as far as I understand
Request timeout to .../operations       # Broken
Request timeout to .../blocks/head      # Broken

(I have a process that checks the blocks/head periodically.) I get the metadata config warnings when I restart Conseil and sometimes before the blackout, too - but not always. I am not sure if the two problems are related.

If I restart the API it works again for some time.

ghost avatar Oct 23 '20 14:10 ghost

What is the CPU utilization of the Tezos node when you experience these Conseil issues? What about CPU utilization overall?

vishakh avatar Oct 23 '20 15:10 vishakh

Hello @vishakh , When it is in the "zombie" state it looks like this:

mpstat

12:35:17 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
12:35:17 PM  all    1.38    0.00    0.68    0.79    0.00    0.03    0.09    0.00    0.00   97.03

ps -p 2014,7823 -o %cpu,%mem,cmd
%CPU %MEM CMD
19.5  2.5 tezos-node run -v --history-mode=archive --data-dir=tezos --network=carthagenet --rpc-addr=0.0.0.0:8732 --config-file=carthagenet.json --connections=5
 3.9 15.4 java -Dconfig.file=/app/conseil/conf/api-carthagenet.conf -jar conseil-api.jar

When I restart conseil-api it gets busier - but only for a minute or two.

ghost avatar Oct 26 '20 11:10 ghost

OK, I did some more testing. The missing metadata overrides seems to be unrelated. I found that this problem occurs if I make very busy calls to /operations and /balance_updates.

What I find really interesting is that when Conseil API freezes then tezos-node (running on the same machine) freezes, too. Do you know why it could happen?

ghost avatar Oct 28 '20 15:10 ghost

Maybe the Postgres container is being hit hard by the Conseil queries. This reduces the available IO for the Tezos node.

Can you provide us with the queries you are running so we can profile them?

vishakh avatar Oct 28 '20 15:10 vishakh

Hi, here are the calls:

{"fields":[],"predicates":[{"field":"operation_group_hash","operation":"in","set":["********************"],"inverse":false},{"field":"kind","operation":"in","set":["reveal","transaction","origination","delegation"],"inverse":false}],"orderBy":[],"aggregation":[]}
2020-11-04 16:25:10,694 TRACE [              Test worker] [client.conseil.TezosConseilAPI] --> END POST (294-byte body)
2020-11-04 16:25:10,741 TRACE [              Test worker] [client.conseil.TezosConseilAPI] <-- 200 OK http://***********************/v2/data/tezos/carthagenet/operations (46ms)
{"fields":[],"predicates":[{"field":"operation_group_hash","operation":"in","set":["**********************"],"inverse":false}],"orderBy":[],"aggregation":[]}
2020-11-04 16:25:10,782 TRACE [              Test worker] [client.conseil.TezosConseilAPI] --> END POST (186-byte body)
2020-11-04 16:25:10,808 TRACE [              Test worker] [client.conseil.TezosConseilAPI] <-- 200 OK http://***********************/v2/data/tezos/carthagenet/balance_updates (25ms)

ghost avatar Nov 04 '20 16:11 ghost

@g574 In both cases the query is optimized since it hits a database index. Now I wonder whether your Postgres container or server is overwhelmed. Could you show us the CPU and memory utilization of the Postgres process when the problem queries are running?

vishakh avatar Nov 06 '20 02:11 vishakh

@vishakh sorry, I found a bug in my systems, the tezos-node is not affected, only Conseil API. Is the Postgres CPU/memory utilization still relevant?

ghost avatar Nov 06 '20 12:11 ghost

Can you clarify what you mean by finding a bug?

On Fri, Nov 6, 2020 at 7:49 AM g574 [email protected] wrote:

@vishakh https://github.com/vishakh sorry, I found a bug in my systems, the tezos-node is not affected, only Conseil API. Is the Postgres CPU/memory utilization still relevant?

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/Cryptonomic/Conseil/issues/941#issuecomment-723063654, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHDKW7IIWEBGWVMXGXWJNDSOPWEVANCNFSM4S4TJEOQ .

vishakh avatar Nov 06 '20 12:11 vishakh

The way I ran queries against tezos-node was wrong so I believed it stopped responding at the same time as Conseil.

ghost avatar Nov 06 '20 12:11 ghost

But you are still having issues with Conseil, yes? If yes, please check the Postgres performance when you are having Conseil problems.

On Fri, Nov 6, 2020 at 7:54 AM g574 [email protected] wrote:

The way I ran queries against tezos-node was wrong so I believed it stopped responding at the same time as Conseil.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Cryptonomic/Conseil/issues/941#issuecomment-723065724, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHDKW6YWMTT5TXWZRJD7S3SOPWX3ANCNFSM4S4TJEOQ .

vishakh avatar Nov 06 '20 12:11 vishakh

mpstat:

03:57:54 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
03:57:54 PM  all    1.44    0.00    0.69    0.51    0.00    0.03    0.10    0.00    0.00   97.24

ps -eo %cpu,%mem,cmd | grep postgres:

 0.0  0.6 /usr/bin/postgres -D ...
 0.0  0.0 postgres: logger process
 0.0 22.8 postgres: checkpointer process
 0.0  0.5 postgres: writer process
 0.0  0.2 postgres: wal writer process
 0.0  0.0 postgres: autovacuum launcher process
 0.0  0.0 postgres: stats collector process
 0.0  0.0 postgres: bgworker: logical replication launcher
 2.4  3.8 postgres: ... ... ...(42560) idle
 1.4  3.7 postgres: ... ... ...(42566) idle
 0.0  0.6 postgres: ... ... ...(42568) idle
 0.0  0.0 postgres: ... ... ...(42570) idle
 0.0  0.0 postgres: ... ... ...(42588) idle
 0.0  0.0 postgres: ... ... ...(42592) idle
 0.0  0.0 postgres: ... ... ...(42596) idle

I don't find it too heavy. Can it be related to the high number of "idle" processes?

ghost avatar Nov 06 '20 15:11 ghost

Very puzzling. So none of the processes seems to be under load while you experience Conseil issues? The host overall is not under any special load?

BTW, would using the Conseil node on https://nautilus.cloud/ work for you?

On Fri, Nov 6, 2020 at 10:08 AM g574 [email protected] wrote:

mpstat:

03:57:54 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle 03:57:54 PM all 1.44 0.00 0.69 0.51 0.00 0.03 0.10 0.00 0.00 97.24

ps -eo %cpu,%mem,cmd | grep postgres:

0.0 0.6 /usr/bin/postgres -D ... 0.0 0.0 postgres: logger process 0.0 22.8 postgres: checkpointer process 0.0 0.5 postgres: writer process 0.0 0.2 postgres: wal writer process 0.0 0.0 postgres: autovacuum launcher process 0.0 0.0 postgres: stats collector process 0.0 0.0 postgres: bgworker: logical replication launcher 2.4 3.8 postgres: ... ... ...(42560) idle 1.4 3.7 postgres: ... ... ...(42566) idle 0.0 0.6 postgres: ... ... ...(42568) idle 0.0 0.0 postgres: ... ... ...(42570) idle 0.0 0.0 postgres: ... ... ...(42588) idle 0.0 0.0 postgres: ... ... ...(42592) idle 0.0 0.0 postgres: ... ... ...(42596) idle

I don't find it too heavy. Can it be related to the high number of "idle" processes?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Cryptonomic/Conseil/issues/941#issuecomment-723129947, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHDKWZANQPUAIGZBBAGVNDSOQGO7ANCNFSM4S4TJEOQ .

vishakh avatar Nov 06 '20 15:11 vishakh

Hi @vishakh , sorry for the delay. I found that the tezos-node makes some extreme disk utilization and I/O. Among other problems it may also be the cause of these blackouts. Is it possible to do anything about it on the Conseil end? Does lowering the number of PostgreSQL threads make any sense?

Thanks for the question but unfortunately nautilus.cloud is not an option for me right now.

ghost avatar Nov 12 '20 08:11 ghost

It sounds like you are just maxing out your IO on the host. The simplest solution is to simply scale up your host. Is that an option for you?

On Thu, Nov 12, 2020 at 3:46 AM g574 [email protected] wrote:

Hi @vishakh https://github.com/vishakh , sorry for the delay. I found that the tezos-node makes some extreme disk utilization and I/O. Among other problems it may also be the cause of these blackouts. Is it possible to do anything about it on the Conseil end? Does lowering the number of PostgreSQL threads make any sense?

Thanks for the question but unfortunately nautilus.cloud is not an option for me right now.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Cryptonomic/Conseil/issues/941#issuecomment-725933769, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHDKW4TROLIEC7HVTXLV43SPOOHDANCNFSM4S4TJEOQ .

vishakh avatar Nov 12 '20 18:11 vishakh

Hi @vishakh , I managed to solve the I/O problem but I still experience these blackouts. I started Conseil API with the -verbose switch and caught this log:

08:07:46.829 [akka.actor.default-dispatcher-30]  INFO  t.c.c.a.d.RecordingDirectives        - HTTP request
08:07:47.825 [-akka.actor.default-dispatcher-3]  INFO  t.c.c.a.d.RecordingDirectives        - HTTP request
08:07:56.888 [akka.actor.default-dispatcher-27]  INFO  t.c.c.a.d.RecordingDirectives        - HTTP request
08:07:57.886 [akka.actor.default-dispatcher-27]  INFO  t.c.c.a.d.RecordingDirectives        - HTTP request
[1894.088s][info][class,load  ] akka.http.impl.engine.parsing.HttpHeaderParser$RawHeaderValueParser source: file:/app/conseil/bin/conseil-api.jar
08:07:59.275 [akka.actor.default-dispatcher-27]  INFO  t.c.c.a.d.RecordingDirectives        - HTTP request
[1894.184s][info][class,load  ] akka.http.javadsl.model.HttpEntity$Default source: file:/app/conseil/bin/conseil-api.jar
[1894.185s][info][class,load  ] akka.http.scaladsl.model.HttpEntity$Default source: file:/app/conseil/bin/conseil-api.jar
[1894.185s][info][class,load  ] java.lang.invoke.LambdaForm$DMH/0x000000084058d840 source: java.lang.invoke.LambdaForm
[1894.186s][info][class,load  ] akka.http.impl.engine.parsing.HttpMessageParser$$Lambda$2609/0x00000008406f0040 source: akka.http.impl.engine.parsing.HttpMessageParser
[1894.186s][info][class,load  ] java.lang.invoke.LambdaForm$DMH/0x000000084058a840 source: java.lang.invoke.LambdaForm
[1894.187s][info][class,load  ] java.lang.invoke.LambdaForm$DMH/0x0000000840589c40 source: java.lang.invoke.LambdaForm
[1894.187s][info][class,load  ] java.lang.invoke.LambdaForm$MH/0x0000000840588040 source: java.lang.invoke.LambdaForm
[1894.188s][info][class,load  ] java.lang.invoke.LambdaForm$DMH/0x0000000840579c40 source: java.lang.invoke.LambdaForm
[1894.188s][info][class,load  ] akka.http.impl.engine.parsing.HttpMessageParser$$Lambda$2610/0x00000008406cb840 source: akka.http.impl.engine.parsing.HttpMessageParser
[1894.189s][info][class,load  ] java.lang.invoke.LambdaForm$DMH/0x0000000840560840 source: java.lang.invoke.LambdaForm
[1894.189s][info][class,load  ] java.lang.invoke.LambdaForm$DMH/0x00000008404f2c40 source: java.lang.invoke.LambdaForm
[1894.189s][info][class,load  ] java.lang.invoke.LambdaForm$MH/0x00000008404f3040 source: java.lang.invoke.LambdaForm
[1894.190s][info][class,load  ] akka.http.impl.engine.parsing.HttpMessageParser$$Lambda$2611/0x0000000840589040 source: akka.http.impl.engine.parsing.HttpMessageParser
[1894.193s][info][class,load  ] akka.stream.stage.GraphStageLogic$SubSourceOutlet source: file:/app/conseil/bin/conseil-api.jar
[1894.194s][info][class,load  ] akka.stream.impl.fusing.SubSink$Command source: file:/app/conseil/bin/conseil-api.jar
[1894.195s][info][class,load  ] akka.stream.stage.GraphStageLogic$SubSourceOutlet$$Lambda$2612/0x0000000840564840 source: akka.stream.stage.GraphStageLogic$SubSourceOutlet
[1894.196s][info][class,load  ] akka.stream.impl.fusing.SubSource source: file:/app/conseil/bin/conseil-api.jar
[1894.198s][info][class,load  ] akka.stream.impl.SubscriptionTimeoutException source: file:/app/conseil/bin/conseil-api.jar
[1894.198s][info][class,load  ] akka.stream.impl.fusing.SubSource$$anon$13 source: file:/app/conseil/bin/conseil-api.jar
[1894.200s][info][class,load  ] akka.http.impl.engine.server.HttpServerBluePrint$PrepareRequests$$anon$1$$anon$2 source: file:/app/conseil/bin/conseil-api.jar
[1894.202s][info][class,load  ] akka.http.impl.engine.parsing.HttpMessageParser$$anonfun$1 source: file:/app/conseil/bin/conseil-api.jar
[1894.204s][info][class,load  ] akka.http.scaladsl.model.EntityStreamException source: file:/app/conseil/bin/conseil-api.jar
[1894.206s][info][class,load  ] akka.http.scaladsl.model.HttpEntity$Default$$Lambda$2613/0x0000000840374040 source: akka.http.scaladsl.model.HttpEntity$Default
[1894.207s][info][class,load  ] akka.http.scaladsl.model.HttpEntity$Limitable$ source: file:/app/conseil/bin/conseil-api.jar
[1894.208s][info][class,load  ] akka.http.scaladsl.model.HttpEntity$SizeLimit source: file:/app/conseil/bin/conseil-api.jar
[1894.208s][info][class,load  ] akka.http.scaladsl.model.HttpEntity$Limitable$$$Lambda$2614/0x0000000840564040 source: akka.http.scaladsl.model.HttpEntity$Limitable$
[1894.209s][info][class,load  ] akka.http.scaladsl.model.HttpEntity$Limitable source: file:/app/conseil/bin/conseil-api.jar
[1894.209s][info][class,load  ] akka.http.scaladsl.model.HttpEntity$Limitable$$anon$1 source: file:/app/conseil/bin/conseil-api.jar
[1894.212s][info][class,load  ] akka.http.impl.util.StreamUtils$$anon$3 source: file:/app/conseil/bin/conseil-api.jar
[1894.212s][info][class,load  ] akka.http.impl.util.StreamUtils$$anon$3$$anon$4 source: file:/app/conseil/bin/conseil-api.jar
[1894.215s][info][class,load  ] akka.http.impl.util.ToStrict source: file:/app/conseil/bin/conseil-api.jar
[1894.216s][info][class,load  ] akka.http.impl.util.ToStrict$$anon$1 source: file:/app/conseil/bin/conseil-api.jar
[1894.217s][info][class,load  ] akka.stream.impl.HeadOptionStage source: file:/app/conseil/bin/conseil-api.jar
[1894.218s][info][class,load  ] akka.stream.SinkShape$ source: file:/app/conseil/bin/conseil-api.jar
[1894.219s][info][class,load  ] akka.stream.scaladsl.Sink$$$Lambda$2615/0x0000000840563840 source: akka.stream.scaladsl.Sink$
[1894.220s][info][class,load  ] akka.stream.impl.HeadOptionStage$$anon$2 source: file:/app/conseil/bin/conseil-api.jar
[1894.221s][info][class,load  ] akka.stream.scaladsl.Sink$$$Lambda$2616/0x0000000840563040 source: akka.stream.scaladsl.Sink$
[1894.222s][info][class,load  ] akka.http.impl.util.ToStrict$$anon$1$$anon$2 source: file:/app/conseil/bin/conseil-api.jar
[1894.223s][info][class,load  ] akka.http.impl.util.ToStrict$$anon$1$$anon$3 source: file:/app/conseil/bin/conseil-api.jar
[1894.225s][info][class,load  ] akka.stream.stage.TimerGraphStageLogic$$anon$2 source: file:/app/conseil/bin/conseil-api.jar
[1894.225s][info][class,load  ] akka.http.scaladsl.server.directives.ExecutionDirectives$$Lambda$2617/0x0000000840560040 source: akka.http.scaladsl.server.directives.ExecutionDirectives
[1894.226s][info][class,load  ] tech.cryptonomic.conseil.api.directives.RecordingDirectives$$anonfun$loggingExceptionHandler$1$$Lambda$2618/0x0000000840556040 source: tech.cryptonomic.conseil.api.directives.RecordingDirectives$$anonfun$loggingExceptionHandler$1
[1894.226s][info][class,load  ] akka.stream.Attributes$$anonfun$getFirst$1 source: file:/app/conseil/bin/conseil-api.jar
[1894.227s][info][class,load  ] tech.cryptonomic.conseil.api.directives.RecordingDirectives$$anonfun$loggingExceptionHandler$1$$Lambda$2619/0x0000000840556840 source: tech.cryptonomic.conseil.api.directives.RecordingDirectives$$anonfun$loggingExceptionHandler$1
[1894.227s][info][class,load  ] tech.cryptonomic.conseil.api.directives.RecordingDirectives$$anonfun$loggingExceptionHandler$1$$Lambda$2620/0x0000000840bf1840 source: tech.cryptonomic.conseil.api.directives.RecordingDirectives$$anonfun$loggingExceptionHandler$1
[1894.228s][info][class,load  ] akka.stream.actor.ActorSubscriberMessage source: file:/app/conseil/bin/conseil-api.jar
[1894.228s][info][class,load  ] akka.stream.impl.fusing.SubSource$$anon$13$$Lambda$2621/0x0000000840557040 source: akka.stream.impl.fusing.SubSource$$anon$13
[1894.228s][info][class,load  ] akka.http.scaladsl.marshalling.Marshaller$$$Lambda$2622/0x0000000840557840 source: akka.http.scaladsl.marshalling.Marshaller$
[1894.229s][info][class,load  ] akka.stream.impl.fusing.SubSink$RequestOne$ source: file:/app/conseil/bin/conseil-api.jar
[1894.229s][info][class,load  ] akka.http.scaladsl.marshalling.Marshaller$$$Lambda$2623/0x0000000840beb440 source: akka.http.scaladsl.marshalling.Marshaller$
[1894.230s][info][class,load  ] akka.http.scaladsl.marshalling.Marshalling$WithOpenCharset source: file:/app/conseil/bin/conseil-api.jar
[1894.231s][info][class,load  ] akka.http.scaladsl.marshalling.Marshal$$anonfun$$nestedInanonfun$toResponseFor$3$1 source: file:/app/conseil/bin/conseil-api.jar
[1894.233s][info][class,load  ] akka.http.impl.engine.rendering.ResponseRenderingContext$CloseRequested$ForceClose$ source: file:/app/conseil/bin/conseil-api.jar
[1894.235s][info][class,load  ] akka.stream.actor.ActorSubscriberMessage$OnComplete$ source: file:/app/conseil/bin/conseil-api.jar
[1894.235s][info][class,load  ] akka.stream.scaladsl.Sink$$$Lambda$2624/0x0000000840be9040 source: akka.stream.scaladsl.Sink$
[2194.264s][info][class,load  ] akka.http.impl.util.package$RichHttpRequest$ source: file:/app/conseil/bin/conseil-api.jar
[2194.267s][info][class,load  ] akka.event.Logging$Info$ source: file:/app/conseil/bin/conseil-api.jar
[2194.268s][info][class,load  ] akka.event.Logging$Info2 source: file:/app/conseil/bin/conseil-api.jar
[2194.271s][info][class,load  ] akka.event.Logging$LogEventWithMarker source: file:/app/conseil/bin/conseil-api.jar
[2194.271s][info][class,load  ] akka.event.Logging$Info3 source: file:/app/conseil/bin/conseil-api.jar
[2194.272s][info][class,load  ] java.time.format.DateTimePrintContext source: shared objects file
[2194.272s][info][class,load  ] java.time.LocalDate$1 source: shared objects file
[2194.272s][info][class,load  ] java.time.format.DateTimeFormatterBuilder$3 source: shared objects file
[2194.279s][info][class,load  ] scala.io.AnsiColor source: file:/app/conseil/bin/conseil-api.jar
[2194.279s][info][class,load  ] scala.DeprecatedConsole source: file:/app/conseil/bin/conseil-api.jar
[2194.279s][info][class,load  ] scala.Console$ source: file:/app/conseil/bin/conseil-api.jar
[2194.317s][info][class,load  ] sun.nio.cs.US_ASCII$Decoder source: jrt:/java.base
[INFO] [11/23/2020 08:12:59.446] [conseil-system-akka.actor.default-dispatcher-927] [akka.actor.ActorSystemImpl(conseil-system)] Request timeout encountered for request [POST /v2/data/tezos/carthagenet/operations Strict(434 bytes)]
[INFO] [11/23/2020 08:13:06.962] [conseil-system-akka.actor.default-dispatcher-927] [akka.actor.ActorSystemImpl(conseil-system)] Request timeout encountered for request [GET /v2/data/tezos/carthagenet/blocks/head Strict(0 bytes)]
[INFO] [11/23/2020 08:13:07.951] [conseil-system-akka.actor.default-dispatcher-927] [akka.actor.ActorSystemImpl(conseil-system)] Request timeout encountered for request [GET /v2/data/tezos/carthagenet/blocks/head Strict(0 bytes)]
[INFO] [11/23/2020 08:13:19.012] [conseil-system-akka.actor.default-dispatcher-929] [akka.actor.ActorSystemImpl(conseil-system)] Request timeout encountered for request [GET /v2/data/tezos/carthagenet/blocks/head Strict(0 bytes)]

My current config looks like this:

include "metadata.conf"

platforms: [ {
  name: tezos
  network: carthagenet
  enabled: true
  node: {
    protocol: "http"
    hostname: "***********"
    port: 8732
    pathPrefix: ""
    }
  }
]

conseil {
  hostname: "0.0.0.0"
  port: *******

  cache-ttl: 15 minutes
  max-query-result-size: 100000
  high-cardinality-limit: 100
  startup-deadline: 5 minutes

  security.api-keys {
    keys: []
    allow-blank: true
  }

  db {
    dataSourceClass: "org.postgresql.ds.PGSimpleDataSource"
    properties {
      user: "************"
      password: "**************"
      url: "jdbc:postgresql://***************/****"
      reWriteBatchedInserts: true
    }
  }

}

akka {
  http {
    dispatcher {
      type: "Dispatcher"
      executor: "thread-pool-executor"
      throughput: 1
      thread-pool-executor {
        fixed-pool-size: 16
      }
    }

    server {
      request-timeout: 5 minutes
      idle-timeout: 5 minutes
    }
  }
}

Does it look correct?

ghost avatar Nov 23 '20 09:11 ghost

hi @vishakh , I suppose there is some misconfiguration. Do you see any red flag in my config?

ghost avatar Dec 08 '20 12:12 ghost

@ivanopagano @piotrkosecki Could you please review @g574's configuration above?

vishakh avatar Dec 08 '20 19:12 vishakh

I think you could try changing this cache-ttl: 15 minutes to cache-ttl: 24 hours. Cache is not needed to be refreshed so frequently - the current value was left there for development purposes.

Other lines look fine for me.

piotrkosecki avatar Dec 09 '20 16:12 piotrkosecki

@g574 Please try what @piotrkosecki mentioned and let us know if you still have issues.

vishakh avatar Dec 09 '20 16:12 vishakh

Thank you. I have updated the configuration. I will observe my Conseil and let you know the result.

ghost avatar Dec 15 '20 14:12 ghost

Hello @piotrkosecki , @vishakh Unfortunately I still get those blackouts. My current config is as follows:

include "metadata.conf"

platforms {
  tezos {
    carthagenet {
      node {
        protocol: "http",
        hostname: "***********"
        port: 8732
        pathPrefix: ""
      }
    }
  }
}

conseil {
  hostname: "0.0.0.0"
  port: *****

  cache-ttl: 24 hours
  max-query-result-size: 100000
  high-cardinality-limit: 100
  startup-deadline: 5 minutes

  security.apiKeys {
    keys: []
    allow-blank: true
  }

  db {
    dataSourceClass: "org.postgresql.ds.PGSimpleDataSource"
    properties {
      user: "**********"
      password: "**************"
      url: "jdbc:postgresql://************"
      reWriteBatchedInserts: true
    }
  }

}

akka {
  http {
    dispatcher {
      type: "Dispatcher"
      executor: "thread-pool-executor"
      throughput: 1
      thread-pool-executor {
        fixed-pool-size: 16
      }
    }

    server {
      request-timeout: 5 minutes
      idle-timeout: 5 minutes
    }
  }
}

It would be OK to have a timeout now and then but I wonder why it does not recover without a restart.

ghost avatar Dec 16 '20 08:12 ghost

I have switched to delphinet and the problem persists. I have not experienced it on mainnet yet.

ghost avatar Jan 13 '21 10:01 ghost

A new release with enhanced logging is almost ready.

On Wed, Jan 13, 2021 at 5:10 AM g574 [email protected] wrote:

I have switched to delphinet and the problem persists. I have not experienced it on mainnet yet.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Cryptonomic/Conseil/issues/941#issuecomment-759346642, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHDKW5NPMLRUHHR3QKHUDDSZVWRRANCNFSM4S4TJEOQ .

vishakh avatar Jan 13 '21 14:01 vishakh

Hi, the problem persists on delphinet - currently on 2021-january-release-35:

[INFO] [02/24/2021 10:59:19.907] [conseil-system-akka.actor.default-dispatcher-573] [akka.actor.ActorSystemImpl(conseil-system)] Request timeout encountered for request [GET /v2/data/tezos/delphinet/blocks/head Strict(0 bytes)]
[INFO] [02/24/2021 10:59:19.288] [conseil-system-akka.actor.default-dispatcher-572] [akka.actor.ActorSystemImpl(conseil-system)] Request timeout encountered for request [GET /v2/data/tezos/delphinet/blocks/head Strict(0 bytes)]
[INFO] [02/24/2021 10:59:19.327] [conseil-system-akka.actor.default-dispatcher-573] [akka.actor.ActorSystemImpl(conseil-system)] Request timeout encountered for request [GET /v2/data/tezos/delphinet/blocks/head Strict(0 bytes)]
[INFO] [02/24/2021 10:59:19.907] [conseil-system-akka.actor.default-dispatcher-573] [akka.actor.ActorSystemImpl(conseil-system)] Request timeout encountered for request [GET /v2/data/tezos/delphinet/blocks/head Strict(0 bytes)]

If I try time curl --noproxy '*' http://<address>/v2/data/tezos/delphinet/blocks/head in hangs forever. Once I restart the api I get a response again after 0.7 seconds. Iotat, free, top don't show any high load. The network traffic is less than 100 kB per sec.

ghost avatar Feb 24 '21 13:02 ghost

Can you describe the hardware you are running on? What are the load averages when this issue happens?

vishakh avatar Feb 24 '21 18:02 vishakh