High number of Timeouts and EMFILE errors
Expected behavior
smooth sailing
Actual behavior
There are several things which make two of my nodes very different to to others:
- High number of "Timed out waiting for response", there could be 1000-3000 in a couple of hours. And there could be a dozen within the same millisecond.
- High number of "warn - connect emfile 66.94.99.223:5306 - local (undefined:undefined)", which i've never seen on normal functioning nodes. After these warnings I can see emfile errors like this one "error - Caught exception: Error: spawn /usr/bin/node EMFILE."
- At some point after hundreds of Timeouts and EMFILE errors node process got terminated with exit code 1 and restarts. But the restart doesn't really fix the issue.
So it seems like the node tries to open a lot of TCP connections / sockets here and at some point OS limits it. As i can see Netdata reports TCP queue overflow and drops from it. Also
Although node continue to bid on jobs and receive "I haven't been chosen"
I tried to remove some data (import_cache, kadence.dht, replication_cache, bootstraps.json, peercache, router.json) as an emulation of restore from backup. But that didn't help really.
Steps to reproduce the problem
Not sure, since these 2 nodes are no different to others, but only them demonstrate the issue. I'm not clear on what triggers this behaviour.
Specifications
- Node version: 5.1.1
- Platform: Ubuntu 18.04
- Node wallet:
- ERC725 identity: 0xB9712dbeD9769ED25500Eb2e123472a86f45e6F7 and 0x9bc66a5e01fbfcb3e804cc60ad80ddc84ee17024
Error logs
Example of timeout within the same millisecond

Example of warn emfile

Example of EMFILE error

Example of when node exited with return code 1

TCP drops

Disclaimer
Please be aware that the issue reported on a public repository allows everyone to see your node logs, node details, and contact details. If you have any sensitive information, feel free to share it by sending an email to [email protected].
I'm seeing this as well...
Hey @calr0x and @botnumberseven thanks for this submission. We've also seen this error occasionally and have pinpointed it to the Kadence library. From the tests we've performed this error doesn't affect the node functionality (other than looking bad in the logs), with the lib being able to handle this, however it is in the scrum pipeline. Also as we are going to replace kadence with a different kademlia implementation in v6 (due to this, but also other issues discovered), this will soon be a thing of the past.
@branarakic agree, since Kadence library will be completely replaced in v6, it's not wroth the efforts to fix in v5.
This issue is not relevant because it was for v5 and current version of OT-node is v6.