neo-node icon indicating copy to clipboard operation
neo-node copied to clipboard

The number of files in Chain_xxx folder keeps growing rapidly when the plugin RpcNep5Tracker was installed.

Open nicolegys opened this issue 6 years ago • 11 comments

When I restart the neo-cli, the size of chain folder will be reduced to 12G, and number of files will be about 7k. Such as: image image

But the number and size will keep growing rapidly then. Here is the screen shot, about 24 hours after my last retarting. image image image And as time goes on, growing......growing~ It seems that if the disk was large enough, the number of files would continue to increase.

Here is the screen shot of LOG file, seems no deleting after compacting. image

Then, we found that when the plugin RpcNep5Tracker was not installed, this issue didn't appear.

nicolegys avatar Jul 30 '19 04:07 nicolegys

Is expected, we index more information, and is stored there

shargon avatar Jul 30 '19 07:07 shargon

Is expected, we index more information, and is stored there

But the size will keep growing, someone saw 158G today. image The disk will be full someday because it's size is limited. For example, my disk is only 100G. I think we should fix the issue, otherwise we would repeated restarting in the long future. TwT

nicolegys avatar Jul 30 '19 08:07 nicolegys

Is expected, we index more information, and is stored there

The direct reason should probably be that unreleased snapshots prevent leveldb compaction from deleting outdated ldb files, after observation & experiments.

After looking into the leveldb log file on the specific problematic client, I observed that compactions failed to delete any outdated ldb file. I guessed that something such as snapshots prevent file deletion. My experiments & corresponding results are as follows:

  1. I built up a local leveldb env & kept on inserting notes as well as creating snapshots without releasing them. In this period I tried compaction but failed to delete outdated files, as what I expected. The number of ldb files kept on rising even when I was just inserting duplicate notes.

  2. The specific problem in this issue never occurs after removing ALL usage of func leveldb_create_snapshot in the code.

So it's obvious that the reason of this issue should probably be conflicts between snapshots & compaction.

I also observed that there are mulitple places in the code where snapshots are created but never released. We are testing to see whether the problem will re-occur after correction.

Qiao-Jin avatar Jul 30 '19 09:07 Qiao-Jin

@Qiao-Jin So is it a plugin bug?

erikzhang avatar Jul 31 '19 08:07 erikzhang

@Qiao-Jin So is it a plugin bug?

The problem might be indirectly caused by some problem in the plugin, say, some exceptions, but the direct reason should be some db snapshots failed to be released. I'm looking for the such snapshots in the code.

Qiao-Jin avatar Jul 31 '19 08:07 Qiao-Jin

@Qiao-Jin So is it a plugin bug?

Now RPCNep5Tracker exposed this bug. neo-cli works well without RPCNep5Tracker.

superboyiii avatar Jul 31 '19 08:07 superboyiii

So you believe the bug is in RpcNep5Tracker? I simply checked the RpcNep5Tracker code and found no problems.

erikzhang avatar Jul 31 '19 10:07 erikzhang

So you believe the bug is in RpcNep5Tracker? I simply checked the RpcNep5Tracker code and found no problems.

Yes, although there seems no obvious relationship between this plugin and this issue. But I've made tests many times for three days. You could try to sync two neo-cli in two different servers, one with RPCNepTracker and one without, syncing to the latest height and wait for four or five hours. You will find absolutely different results of available disk space.

superboyiii avatar Jul 31 '19 10:07 superboyiii

We retried the version removing ALL usage of func leveldb_create_snapshot in the code for a whole day, and this problem never occurs.

Qiao-Jin avatar Aug 02 '19 02:08 Qiao-Jin

FWIW I'm seeing the exact same issue occur on our node pool as well.

This is a horrible hack way of working around this but we need the RpcNep5Tracker plugin on our nodes.

Until somebody resolves the issue this is the (again admittedly horribly hacky) way I've worked around this problem.

It's simply fired with a cron job every 15 minutes.

#!/bin/bash

THRESHOLD=90
PERCENT_USED=`df -hT / | grep / | awk '{ print $6}' | sed s'/.$//'`

if (( PERCENT_USED >= THRESHOLD )); then
        echo `/bin/date` "- TIME TO RESTART NEO, PRIMARY PARTION "$PERCENT_USED"% FULL"
        /usr/sbin/service neo stop
        /bin/sleep 1
        /usr/sbin/service neo start
        echo `/bin/date` "- NEO HAS BEEN RESTARTED"
fi

Note: I should also mention this syntax is based around the fact we have the neo-cli rpcserver being maintained as a systemd daemon.

HayesData avatar Aug 03 '19 09:08 HayesData

Some similiar bugs reported by leveldb users: https://github.com/Level/leveldown/issues/273 https://github.com/google/leveldb/issues/164

Qiao-Jin avatar Aug 09 '19 02:08 Qiao-Jin

Old, if remains, please re-open

shargon avatar Dec 05 '23 13:12 shargon