erigon icon indicating copy to clipboard operation
erigon copied to clipboard

press test on external rpc daemon make erigon stop syncing

Open leowkong opened this issue 2 years ago • 4 comments

System information

Erigon version: 2022.99.99-dev-c1f84874

OS & Version: Linux

Commit hash : c1f848746dafe06f75c9bc7c5647a2ed4e429a53

behaviour

I have three node for press test , external rpcdaemon and erigin processes are deployed on different nodes, and I make a lot pressure tests (contain debug_traceBlockByNumber , eth_call , debug_traceTransaction and so on) from third nodes . The erigon node is a archive node , and rpc request is around the block height of 1900w (in a range of 50000 blocks)

The abnormal things is that erigon process stop syncing blocks for a long time , after I restart the erigon , the syncing process recovery

image

I am not sure it is related to external rpcdaemon , I make a lot pressure tests on internal RPC , but no syncing error happened . and the error seems happened after the log of the connection info of external RPC Daemon

leowkong avatar Aug 03 '22 13:08 leowkong

I have the same issue on several nodes on different blockchains, though I am not convinced it is to do with the external rpcdaemon, other nodes I run never have this.

PeaStew avatar Aug 03 '22 13:08 PeaStew

I can repro this issue very easily. I run massing amount of parallel requests to call contracts at different block levels.

Pprof of erigon erigon-mutext.txt erigon-goroutine.txt

Pprof o/p of rpcdaemon rpc-mutext.txt rpc-goroutine.txt

Args passed to erigon node

            - --datadir=/var/erigon2/data
            - --private.api.addr=localhost:9090
            - --http=false
            - --maxpeers=300
            - --metrics
            - --metrics.addr=0.0.0.0
            - --metrics.port=6062
            - --authrpc.jwtsecret=/etc/jwt-secret/jwt.secret
            - --pprof
            - --pprof.addr=0.0.0.0
            - --pprof.port=6059

Args passed to rpcdaemon node

            - --db.read.concurrency=1000
            - --private.api.addr=localhost:9090
            - --http.api=eth,erigon,web3,net,debug,trace,txpool,admin,engine
            - --ws
            - --ws.compression
            - --rpc.gascap=300000000
            - --http.addr=0.0.0.0
            - --txpool.api.addr=0.0.0.0:9091
            - --http.vhosts=*
            - --metrics
            - --metrics.addr=0.0.0.0
            - --metrics.port=6061
            - --rpc.batch.concurrency=1000
            - --pprof
            - --pprof.addr=0.0.0.0

Image I am using is thorax/erigon@sha256:ad994b141631580e4d8505e4bba606dc4e9d02aee41e2ad4a0bd9f1d39eecb43 .

zimbabao avatar Aug 14 '22 14:08 zimbabao

@revittm I think that one is to you :)

mandrigin avatar Aug 15 '22 07:08 mandrigin

Update from another issue spawned from this one: https://github.com/ledgerwatch/erigon/issues/5125#issuecomment-1221883278.

Temporary workaround: use --datadir flag (edit: this workaround does not work)

revitteth avatar Aug 22 '22 11:08 revitteth

Hopefully this might help https://github.com/ledgerwatch/erigon-lib/pull/639

hexoscott avatar Sep 14 '22 13:09 hexoscott

@kaikash and @zimbabao please would you be able to help try and repro with the proposed fix?

revitteth avatar Sep 14 '22 17:09 revitteth

I can give it a try over the weekend. Have deadlines. @hexoscott, @revitteth : Any one of you have prebuilt image uploaded anywhere publicly, fine if not, I'll set up the build.

zimbabao avatar Sep 14 '22 21:09 zimbabao

Hi @zimbabao, sadly no public image/build for this one.

hexoscott avatar Sep 15 '22 07:09 hexoscott

@zimbabao can build local image fairly easily (hopefully useful):

  1. redirect go-lib dependency to a local copy of the branch @hexoscott has pull requested, in go.mod - replace github.com/ledgerwatch/erigon-lib => ../erigon-lib (assuming erigon and erigon lib sit side by side in the same dir)
  2. make the container image DOCKER_TAG=thorax/erigon:ci-local DOCKER_UID=$(id -u) DOCKER_GID=$(id -g) make docker

revitteth avatar Sep 15 '22 10:09 revitteth

As @mandrigin just pointed out to me too - we could push this on a branch docker_foo and it will build a container out to thorax/erigon with tag thorax/erigon:docker_foo

revitteth avatar Sep 15 '22 12:09 revitteth

@revitteth @hexoscott thanks for the fix. My tests confirmed that this is fixed (ran for 40 minutes and still chugging). I was able to run at 2000/RPS on archive nodes at random blocks.

zimbabao avatar Sep 18 '22 21:09 zimbabao

Great news, let's close and we can reopen if we find any other related problem!

revitteth avatar Sep 19 '22 04:09 revitteth