neo-node The number of files in Chain_xxx folder keeps growing rapidly when the plugin RpcNep5Tracker was installed.

When I restart the neo-cli, the size of chain folder will be reduced to 12G, and number of files will be about 7k. Such as:

But the number and size will keep growing rapidly then. Here is the screen shot, about 24 hours after my last retarting. And as time goes on, growing......growing~ It seems that if the disk was large enough, the number of files would continue to increase.

Here is the screen shot of LOG file, seems no deleting after compacting.

Then, we found that when the plugin RpcNep5Tracker was not installed, this issue didn't appear.

Jul 30 '19 04:07 nicolegys

Is expected, we index more information, and is stored there

Jul 30 '19 07:07 shargon

Is expected, we index more information, and is stored there

But the size will keep growing, someone saw 158G today. The disk will be full someday because it's size is limited. For example, my disk is only 100G. I think we should fix the issue, otherwise we would repeated restarting in the long future. TwT

Jul 30 '19 08:07 nicolegys

Is expected, we index more information, and is stored there

The direct reason should probably be that unreleased snapshots prevent leveldb compaction from deleting outdated ldb files, after observation & experiments.

After looking into the leveldb log file on the specific problematic client, I observed that compactions failed to delete any outdated ldb file. I guessed that something such as snapshots prevent file deletion. My experiments & corresponding results are as follows:

I built up a local leveldb env & kept on inserting notes as well as creating snapshots without releasing them. In this period I tried compaction but failed to delete outdated files, as what I expected. The number of ldb files kept on rising even when I was just inserting duplicate notes.
The specific problem in this issue never occurs after removing ALL usage of func leveldb_create_snapshot in the code.

So it's obvious that the reason of this issue should probably be conflicts between snapshots & compaction.

I also observed that there are mulitple places in the code where snapshots are created but never released. We are testing to see whether the problem will re-occur after correction.

Jul 30 '19 09:07 Qiao-Jin

@Qiao-Jin So is it a plugin bug?

Jul 31 '19 08:07 erikzhang

@Qiao-Jin So is it a plugin bug?

The problem might be indirectly caused by some problem in the plugin, say, some exceptions, but the direct reason should be some db snapshots failed to be released. I'm looking for the such snapshots in the code.

Jul 31 '19 08:07 Qiao-Jin

@Qiao-Jin So is it a plugin bug?

Now RPCNep5Tracker exposed this bug. neo-cli works well without RPCNep5Tracker.

Jul 31 '19 08:07 superboyiii

So you believe the bug is in RpcNep5Tracker? I simply checked the RpcNep5Tracker code and found no problems.

Jul 31 '19 10:07 erikzhang

So you believe the bug is in RpcNep5Tracker? I simply checked the RpcNep5Tracker code and found no problems.

Yes, although there seems no obvious relationship between this plugin and this issue. But I've made tests many times for three days. You could try to sync two neo-cli in two different servers, one with RPCNepTracker and one without, syncing to the latest height and wait for four or five hours. You will find absolutely different results of available disk space.

Jul 31 '19 10:07 superboyiii

We retried the version removing ALL usage of func leveldb_create_snapshot in the code for a whole day, and this problem never occurs.

Aug 02 '19 02:08 Qiao-Jin

FWIW I'm seeing the exact same issue occur on our node pool as well.

This is a horrible hack way of working around this but we need the RpcNep5Tracker plugin on our nodes.

Until somebody resolves the issue this is the (again admittedly horribly hacky) way I've worked around this problem.

It's simply fired with a cron job every 15 minutes.

#!/bin/bash

THRESHOLD=90
PERCENT_USED=`df -hT / | grep / | awk '{ print $6}' | sed s'/.$//'`

if (( PERCENT_USED >= THRESHOLD )); then
        echo `/bin/date` "- TIME TO RESTART NEO, PRIMARY PARTION "$PERCENT_USED"% FULL"
        /usr/sbin/service neo stop
        /bin/sleep 1
        /usr/sbin/service neo start
        echo `/bin/date` "- NEO HAS BEEN RESTARTED"
fi

Note: I should also mention this syntax is based around the fact we have the neo-cli rpcserver being maintained as a systemd daemon.

Aug 03 '19 09:08 HayesData

Some similiar bugs reported by leveldb users: https://github.com/Level/leveldown/issues/273 https://github.com/google/leveldb/issues/164

Aug 09 '19 02:08 Qiao-Jin

Old, if remains, please re-open

Dec 05 '23 13:12 shargon

neo-node neo-node copied to clipboard

The number of files in Chain_xxx folder keeps growing rapidly when the plugin RpcNep5Tracker was installed.

neo-node
neo-node copied to clipboard