erigon
erigon copied to clipboard
press test on external rpc daemon make erigon stop syncing
System information
Erigon version: 2022.99.99-dev-c1f84874
OS & Version: Linux
Commit hash : c1f848746dafe06f75c9bc7c5647a2ed4e429a53
behaviour
I have three node for press test , external rpcdaemon and erigin processes are deployed on different nodes, and I make a lot pressure tests (contain debug_traceBlockByNumber , eth_call , debug_traceTransaction and so on) from third nodes . The erigon node is a archive node , and rpc request is around the block height of 1900w (in a range of 50000 blocks)
The abnormal things is that erigon process stop syncing blocks for a long time , after I restart the erigon , the syncing process recovery

I am not sure it is related to external rpcdaemon , I make a lot pressure tests on internal RPC , but no syncing error happened . and the error seems happened after the log of the connection info of external RPC Daemon
I have the same issue on several nodes on different blockchains, though I am not convinced it is to do with the external rpcdaemon, other nodes I run never have this.
I can repro this issue very easily. I run massing amount of parallel requests to call contracts at different block levels.
Pprof of erigon erigon-mutext.txt erigon-goroutine.txt
Pprof o/p of rpcdaemon rpc-mutext.txt rpc-goroutine.txt
Args passed to erigon node
- --datadir=/var/erigon2/data
- --private.api.addr=localhost:9090
- --http=false
- --maxpeers=300
- --metrics
- --metrics.addr=0.0.0.0
- --metrics.port=6062
- --authrpc.jwtsecret=/etc/jwt-secret/jwt.secret
- --pprof
- --pprof.addr=0.0.0.0
- --pprof.port=6059
Args passed to rpcdaemon node
- --db.read.concurrency=1000
- --private.api.addr=localhost:9090
- --http.api=eth,erigon,web3,net,debug,trace,txpool,admin,engine
- --ws
- --ws.compression
- --rpc.gascap=300000000
- --http.addr=0.0.0.0
- --txpool.api.addr=0.0.0.0:9091
- --http.vhosts=*
- --metrics
- --metrics.addr=0.0.0.0
- --metrics.port=6061
- --rpc.batch.concurrency=1000
- --pprof
- --pprof.addr=0.0.0.0
Image I am using is thorax/erigon@sha256:ad994b141631580e4d8505e4bba606dc4e9d02aee41e2ad4a0bd9f1d39eecb43 .
@revittm I think that one is to you :)
Update from another issue spawned from this one: https://github.com/ledgerwatch/erigon/issues/5125#issuecomment-1221883278.
Temporary workaround: use --datadir flag (edit: this workaround does not work)
Hopefully this might help https://github.com/ledgerwatch/erigon-lib/pull/639
@kaikash and @zimbabao please would you be able to help try and repro with the proposed fix?
I can give it a try over the weekend. Have deadlines. @hexoscott, @revitteth : Any one of you have prebuilt image uploaded anywhere publicly, fine if not, I'll set up the build.
Hi @zimbabao, sadly no public image/build for this one.
@zimbabao can build local image fairly easily (hopefully useful):
- redirect go-lib dependency to a local copy of the branch @hexoscott has pull requested, in go.mod -
replace github.com/ledgerwatch/erigon-lib => ../erigon-lib
(assuming erigon and erigon lib sit side by side in the same dir) - make the container image
DOCKER_TAG=thorax/erigon:ci-local DOCKER_UID=$(id -u) DOCKER_GID=$(id -g) make docker
As @mandrigin just pointed out to me too - we could push this on a branch docker_foo and it will build a container out to thorax/erigon with tag thorax/erigon:docker_foo
@revitteth @hexoscott thanks for the fix. My tests confirmed that this is fixed (ran for 40 minutes and still chugging). I was able to run at 2000/RPS on archive nodes at random blocks.
Great news, let's close and we can reopen if we find any other related problem!