neo-node
neo-node copied to clipboard
The number of files in Chain_xxx folder keeps growing rapidly when the plugin RpcNep5Tracker was installed.
When I restart the neo-cli, the size of chain folder will be reduced to 12G, and number of files will be about 7k. Such as:

But the number and size will keep growing rapidly then.
Here is the screen shot, about 24 hours after my last retarting.
And as time goes on, growing......growing~
It seems that if the disk was large enough, the number of files would continue to increase.
Here is the screen shot of LOG file, seems no deleting after compacting.

Then, we found that when the plugin RpcNep5Tracker was not installed, this issue didn't appear.
Is expected, we index more information, and is stored there
Is expected, we index more information, and is stored there
But the size will keep growing, someone saw 158G today.
The disk will be full someday because it's size is limited. For example, my disk is only 100G.
I think we should fix the issue, otherwise we would repeated restarting in the long future. TwT
Is expected, we index more information, and is stored there
The direct reason should probably be that unreleased snapshots prevent leveldb compaction from deleting outdated ldb files, after observation & experiments.
After looking into the leveldb log file on the specific problematic client, I observed that compactions failed to delete any outdated ldb file. I guessed that something such as snapshots prevent file deletion. My experiments & corresponding results are as follows:
-
I built up a local leveldb env & kept on inserting notes as well as creating snapshots without releasing them. In this period I tried compaction but failed to delete outdated files, as what I expected. The number of ldb files kept on rising even when I was just inserting duplicate notes.
-
The specific problem in this issue never occurs after removing ALL usage of func leveldb_create_snapshot in the code.
So it's obvious that the reason of this issue should probably be conflicts between snapshots & compaction.
I also observed that there are mulitple places in the code where snapshots are created but never released. We are testing to see whether the problem will re-occur after correction.
@Qiao-Jin So is it a plugin bug?
@Qiao-Jin So is it a plugin bug?
The problem might be indirectly caused by some problem in the plugin, say, some exceptions, but the direct reason should be some db snapshots failed to be released. I'm looking for the such snapshots in the code.
@Qiao-Jin So is it a plugin bug?
Now RPCNep5Tracker exposed this bug. neo-cli works well without RPCNep5Tracker.
So you believe the bug is in RpcNep5Tracker? I simply checked the RpcNep5Tracker code and found no problems.
So you believe the bug is in RpcNep5Tracker? I simply checked the RpcNep5Tracker code and found no problems.
Yes, although there seems no obvious relationship between this plugin and this issue. But I've made tests many times for three days. You could try to sync two neo-cli in two different servers, one with RPCNepTracker and one without, syncing to the latest height and wait for four or five hours. You will find absolutely different results of available disk space.
We retried the version removing ALL usage of func leveldb_create_snapshot in the code for a whole day, and this problem never occurs.
FWIW I'm seeing the exact same issue occur on our node pool as well.
This is a horrible hack way of working around this but we need the RpcNep5Tracker plugin on our nodes.
Until somebody resolves the issue this is the (again admittedly horribly hacky) way I've worked around this problem.
It's simply fired with a cron job every 15 minutes.
#!/bin/bash
THRESHOLD=90
PERCENT_USED=`df -hT / | grep / | awk '{ print $6}' | sed s'/.$//'`
if (( PERCENT_USED >= THRESHOLD )); then
echo `/bin/date` "- TIME TO RESTART NEO, PRIMARY PARTION "$PERCENT_USED"% FULL"
/usr/sbin/service neo stop
/bin/sleep 1
/usr/sbin/service neo start
echo `/bin/date` "- NEO HAS BEEN RESTARTED"
fi
Note: I should also mention this syntax is based around the fact we have the neo-cli rpcserver being maintained as a systemd daemon.
Some similiar bugs reported by leveldb users: https://github.com/Level/leveldown/issues/273 https://github.com/google/leveldb/issues/164
Old, if remains, please re-open