vocdoni-node icon indicating copy to clipboard operation
vocdoni-node copied to clipboard

Reduce disk space for vochain statedb

Open p4u opened this issue 3 years ago • 4 comments

For a vochain network with the following details:

  • height:4021
  • processes:550
  • votes:4789145

The disk space used is 13G distributed as follows:

  • 3.1G blockstore.db
  • 160M state.db
  • 9.0G vcstate

While the whole list of transactions (stored by Tendermint) is 3.1GiB, the vochain state uses 9GiB. This seems a bit too much since the state should not save the full vote package but only the hash of it. Why is this happening and how can we reduce the amount of disk space required by statedb?

p4u avatar Oct 07 '21 11:10 p4u

Oh, I see now the state stores the whole set of bytes of every casted vote https://github.com/vocdoni/vocdoni-node/blob/master/vochain/state.go#L576

In the previous version of statedb we were only saving the Hash of the vote, since the vote bytes can be recovered from the Tendermint blockstore.

Is there a reason for this change @ed255 ?

EDIT: Wrong comment, the current code stores only the hash :sweat_smile:

p4u avatar Oct 07 '21 11:10 p4u

The current AddVote method stores the following information:

	sdbVote := models.StateDBVote{
		VoteHash:  ethereum.HashRaw(voteBytes),
		ProcessId: vote.ProcessId,
		Nullifier: vote.Nullifier,
	}

This is three hashes of 32 bytes each, so 96 bytes per vote. Taking into account the number of votes, the total raw store data for the votes should be around 438 MiB, which is far away from the 9 GiB currently stored by statedb.

p4u avatar Oct 07 '21 11:10 p4u

Here are two sources of storage overhead that I can think of.

Storage of nodes in outdated paths in Arbo

For every Tree.Add / Tree.Set, some of the nodes in the merkle tree are updated. For each node update, Arbo creates a new key-value entry in the DB and doesn't delete the key-value entry corresponding to the updated node. This allows traversing the tree from old Roots, but also means that outdated nodes (nodes that can't be reached with the current tree) are stored in the DB.

So for example, if we have a balanced tree with 1024 leafs (that's 10 levels), and we update each leaf, since we have updated every node now we are taking more than double of the space as before. Here are some rough numbers for this example:

L = Leaf Size
N = Intermediate node Size
# Original tree with 1024 leafs
size = (1024 * 2 - 1) * N + 1024 * L
# Tree after 1024 updates
size = (1024 * 2 - 1) * N + 1024 * L + 10 * 1024 * N + 1024 * L = (12 * 1024 - 1) * N + 2 * 1024 * L 

Notice that leaf update in this tree updates 10 nodes (the leaf path), so we outdate 10 nodes each time.

Solutions

A. Arbo could delete outdated nodes on each operation. I think this one is quite easy to achieve, nevertheless a requirement for this would be that the StateDB uses a single Tx for a block update, and currently we are using more than one Tx in order to avoid the Tx growing too much. BadgerDB has a preconfigured limit on the number of writes in a Tx, whereas Pebble can use as much memory as is available.

Edit1: I have thought about a solution for the fact that StateDB does multiple commits internally within a single block. Simmilarly to the db.Batch type we have, we could have a special type that does commits internally when the Tx becomes too big, but stores the list of deletes into a list that is only applied on db.Batch.Commit(). This way we make sure we don't delete nodes before ending a block, which would be problematic if a crash happens at that point.

B. Figure out a way to only keep nodes accessible from "pinned" snapshots. This is what filesystems that support snapshots do. I don't know how that would work internally. With this, we could have pinned the last 5 blocks in the StateDB for example, and delete all the nodes that can't be reached from those 5 points. This deletion could happen after a block is ended.

C. On node reboot, export and import all the arbo trees. This would apply to the censusManager and StateDB. By exporting an importing the leafs, we discard all the outdated nodes, although this operation will take longer the bigger the trees are, and requires node restart.

Considerations

Currently the StateDB doesn't store information about the subtrees layout. For some of the solutions, having this layout would be required. This could be done by making sure that after opening a subTree, the subTree config is stored in the NoState KVDB of the mainTree, or in the metadata section of the underlying DB. This would allow programatically operating on all the subTrees without needing to specify each subTree. There would be a challenge though, which is that a subTreeConfiguration requires 2 functions, and we can't serialize functions into a DB :S

Key Value DB uses long prefixes in StateDB

The subTrees in the StateDB are stored in a unique Key Value DB for the StateDB using prefixes. The deeper the subtree, the longer the key (in key-value) prefix will be. For example, the VoteTree is under the ProcessTree under the MainTree. So every key-value stored by the VoteTree is prefixed with the ProcessID (which is 32bytes). If the Key-ValueDB stores the key in raw for each entry (instead of storing the key as part of a radix tree for example), this means a space overhead for each KV entry, which can add up. This needs to be investigated to figure out if it's a noticable overhead (for example, write a test where an Arbo tree is filled, one with a raw WriteTx, another one with a PrefixedWriteTx, and then compare the disk usage of both).

ed255 avatar Oct 11 '21 13:10 ed255

Update: talked with @ed255 offline, and we agreed to go with option A. Added an issue in arbo to implement this: https://github.com/vocdoni/arbo/issues/25 I understand that is not an 'urgent' thing to have, but once I finish the current anonymous-voting tasks, is the first thing that I will implement.

arnaucube avatar Oct 13 '21 09:10 arnaucube

Finally implemented as part of this PR https://github.com/vocdoni/vocdoni-node/pull/1080

p4u avatar Sep 01 '23 14:09 p4u