horizon icon indicating copy to clipboard operation
horizon copied to clipboard

Database denormalization

Open bartekn opened this issue 10 years ago • 0 comments

In the context of #76, #80, #86 I'd like to start a discussion about horizon DB denormalization.

horizon is a critical component in Stellar ecosystem. Because of this it should be very efficient, fast and, I hope, will have to server requests of millions of users. I'd like to start with a very simple solution that will minimize or even prevent possible performance issues but also make implementing features like those above much easier.

Let's denormalize all DB tables in a way that no 1:1 and 1:n relation JOINs are needed.

I understand that it will cost storage but storage is cheaper (exactly $0.100 per GB-month on RDS) than DB instances' CPU or Ram required to run expensive SQL queries. It will also save us a lot of time because we won't have to deal with performance issues that may arise with JOINs: we won't have to optimize queries and I think that for long we won't need a caching layer. horizon DB is almost read-only (I think only orderbook will change over time?) so it would be very easy to implement - since data won't change over time we won't run into issues like updating a lot of rows to reflect changes made in one table (check Example 3 below).

This idea is just an entry point to the discussion so add your comments.

Example 1 - no changes over time In #80 we wanted to add information about account connected to effect. Existing database schema and a single row:

 history_account_id | history_operation_id | order | type |             details              
--------------------+----------------------+-------+------+----------------------------------
    105505871630336 |      105505871630337 |     1 |    0 | {"starting_balance": 1000000000}

The desired schema:

 history_account_id |                      account_id                          | history_operation_id | order | type |             details              
--------------------+----------------------------------------------------------+----------------------+-------+------+----------------------------------
    105505871630336 | GBS43BF24ENNS3KPACUZVKK2VYPOZVBQO2CISGZ777RYGOPYC2FT6S3K |     105505871630337  |     1 |    0 | {"starting_balance": 1000000000}

No JOINs are needed and account ID connected to this effect will never change - we're safe.

Example 2 - data changes over time Account can have signers and we are going to display current signers in /accounts endpoint. We could create a account_signers table but more effective solution would be to simply add signers field to history_accounts table. In this case horizon-importer will change signers field.

Example 3 - when it won't work I can't find any example right now but to visualize it let's say users can change their account address and we won't display address of the account that sent a transaction. In this case, we obviously don't won't to denormalize history_transactions database because we would need to change a lot of records when user changes an address (1:n relation). In this case we should use JOIN.

Drawbacks

  • Schema changes in large databases take a lot of time. In production, we will probably need to start a new DB and horizon-importer and switch to a new DB after import is done.
  • ??

bartekn avatar Sep 03 '15 20:09 bartekn