ot-node icon indicating copy to clipboard operation
ot-node copied to clipboard

V6 Performance Documentation and potential mem-leak

Open UniMa007 opened this issue 3 years ago • 2 comments

Issue description

This is intended as a feedback to the developers for the performance, as well as a potential bug I have found.

Documentation I'm running the following setup on my Amazon Cloud: image

My AWS Lightsail Instance with the 4 nodes uses an UBUNTU 20.4 TLS instance with the given size image

Directly after starting my OT-Nodes, this is the memory consumption of the four nodes running on the machine

Attributes comma separated are:

Process ID, user , %mem , command image

Each node takes roughly 153MB of RAM ( lets take 7.5% of 2048 MB)

Problem I've been hammering my 4 nodes in 2 waves for roughly two hours. I don't have memory monitoring activated, so I cannot show a diagram of them RAM usage yet. After 2 hours of hammering jobs to those four nodes, my VM crashed.

I've analyzed the behaviour and saw the following:

CPU image

It looks like the CPU usage has risen within 10 mins to 80%, within the next 10 mins to 100% and then crash the EC2 instance.

RAM As I said, I don't have mem usage graph yet, as lightsail is not as good integrated into AWS yet, but I can see, that the memory consumption of each node rises from the starting 6-7% up to 20% within a timespan of a few hours.

This is the screenshot of the four nodes running for like 2 hours of constant publishing.

Process ID, user , %mem , command image

Since neither database, nor blazegraph are running on the node, it looks like there is some memory problem within the OT-Node at the moment. Do I analyze that correctly? Do you have another idea, what it could be? If this is just regular behaviour, because the consumption is indeed intended to double across the usage then nevermind this ticket and just take it as a performance documentation :)

Expected behavior

The OT-Node RAM usage does not triple/quad in 2 hours, but stay constant

Actual behavior

Triples until my tiny machine runs OOM/CPU

Steps to reproduce the problem

  1. Realize the architecture as mentioned above
  2. Run 4 Nodes
  3. Publish jobs each few seconds for a limited amount of time

Specifications

  • Node version: Latest v6.0.0-beta.1.23
  • Platform: Ubuntu 20.4 TLS
  • Node wallet: 0x03405Ce6eD71642EA50b0F6073c113f6Ea7149B6
  • Node libp2p identity: Many different, do you need them?

Contact details

  • Email: hansi1337 at gmail dot com
  • Discord: angrymob

Error logs

Disclaimer

Please be aware that the issue reported on a public repository allows everyone to see your node logs, node details, and contact details. If you have any sensitive information, feel free to share it by sending an email to [email protected].

UniMa007 avatar Feb 18 '22 15:02 UniMa007

The issue still persists with the latest 1.29 testnet version, thus I will migrate to the 20$ instance on AWS and check whether the memory leak is still persisting or will memory consumption will reach a plateau.

UniMa007 avatar Mar 10 '22 09:03 UniMa007

Have been running my script three times today: Each time, after a few hours my VM crashed:

image

=> As mentioned in the comment above, will switch to 20 $ machine with double RAM+CPU and report back.

UniMa007 avatar Mar 10 '22 15:03 UniMa007

This issue is being closed as inactive due to the date of the last activity on it. However, we would love to see if you execute this test on latest code.

Thank you, OriginTrail Team

NZT48 avatar Dec 26 '22 16:12 NZT48