py-evm icon indicating copy to clipboard operation
py-evm copied to clipboard

Loosen hard dependency on `DatabaseAPI` to allow more flexibility in storage.

Open pipermerriam opened this issue 6 years ago • 2 comments

What is wrong?

I think our database model needs thought.

We are already in a good place having HeaderDB, ChainDB, AccountDB and AccountStorageDB. Conceptually I think these constructs are the right direction.

However, everything we do is heavily tied to a base key/value store architecture based on the DatabaseAPI (previously BaseDB). This constrains us to a few things which I think need to be addressed at some point for Trinity to really thrive.

  1. Only a key/value store type of API for interacting with the database.
  2. Combined storage for all Chain and state data

How can it be fixed

I'm only just starting to think about this but the ideal solution should be able to support the following functionality.

  1. Separate storage mechanisms for different data types (headers, blocks, transactions, receipts, account state, contract state).
    • We should be able to store each of these separately if desired.
  2. Separate storage mechanisms for the same data types
    • We should be able to store old chain data in a different spot than new chain data.

We can likely already accomplish the majority of these things using the existing ChainDB and HeaderDB APIs however I believe we have to eliminate the base requirement that these APIs simply depend on a simple key-value store to allow for them to leverage different storage mechanisms.

One complication that we have to deal with now is that the VirtualMachineAPI and ChainAPI both have hard dependencies on the DatabaseAPI. For now, we can probably get away with defining a base abstraction for accessing raw account state and still allow each VirtualMachineAPI implementation to wrap that in it's own APIs specific to it's chain rules. Same with AccountStorageDB.

pipermerriam avatar Aug 05 '19 01:08 pipermerriam

As you say, I think we are already half way there. What we currently have with HeaderDB, AccountDB is not too far from something like the repository pattern imho.

class RepositoryAPI(ABC, Generic[TEntity, TEntityKey]):

    @abstractmethod
    def find(self, key TEntityKey) -> TEntity:
        ...

    @abstractmethod
    def add(self, key TEntity) -> None:
        ...

    @abstractmethod
    def delete(self, key TEntityKey) -> None:
        ...

We'd then have BlockRepository, TransactionRepository, ReceiptRepository and so on. Each of them can possibly use a different technology for persistence and one could also split things further up to have different repositories for the same type to create things like AncientBlockRepository etc.

This isn't too different from where we are except that (as you pointed out) we are currently bleeding a raw DatabaseAPI directly into the Chain and VM. I guess that's the main thing to tackle. If we get rid of that, then each entity is accessed by it's own repository or (db if you will, it's all just names...) and each could theoretically use it's entirely different way of persistence.

Not sure if this adds anything from what you've already said but I just stumbled over this issue today when I was working on #1886 and thought I leave a comment.

cburgdorf avatar Nov 29 '19 10:11 cburgdorf

There is one nuance to this which is that some of these need context. For example we probably need to store the most recent 1000 blocks somewhere like a fast K/V store, but then once blocks are at least 1000 blocks old then they would be migrated to the ancient block store. This means that the database needs to have some concept of context into what the canonical chain head is...

pipermerriam avatar Nov 29 '19 15:11 pipermerriam