Managing self-hosted DB
Hello :wave: I was wondering what was your perspective towards the ever-growing tables that are user_events and journey_user_step? Do you apply manual suppression/partitionning after some time or do you throw a bigger disk and call it a day?
mysql> SELECT
-> table_schema as `Database`,
-> table_name AS `Table`,
-> round(((data_length + index_length) / 1024 / 1024), 2) `Size in MB`
-> FROM information_schema.TABLES
-> ORDER BY (data_length + index_length) DESC;
+--------------------+------------------------------------------------------+------------+
| Database | Table | Size in MB |
+--------------------+------------------------------------------------------+------------+
| parcelvoy | journey_user_step | 33137.23 |
| parcelvoy | user_events | 7776.63 |
Hello! That is the general recommendation, storage is cheap these days. We have an install with over 750GB of user event data and it's mostly working fine. The issue once you get to a really large size is that list generation starts taking longer since there is more data to parse through unfortunately. One of the largest contributors to storage size (if you are sending lots of emails) is usually the raw body of the email that was sent. You can turn this off by setting LOG_COMPILED_MESSAGE to be false, the downside is you wont be able to see what the message looks like that the user received.
Looking to the future, we are actively exploring a few different improvements for really large installs:
- Allowing for purging of users/events based on retention periods. Most other tools in the market allow for something like this since data does become stale over time
- Adding in filters than can be applied to prevent unneeded data from being stored in the first place. Most customers start off by sending too many events or properties that they just don't need which ends up taking up space.
- And lastly, we are actively in progress of rewriting the core of Parcelvoy to get rid of the
user_eventstable completely and move all event data to ClickHouse. Allows for much faster querying and much more efficient storage of the data.
@Wardormeur can you give me some insight into how many rows are in your journey_user_step table and how large your journeys are? Also do you have any webhook steps?
About 200M rows
mysql> SELECT count(*) FROM journey_user_step;
+-----------+
| count(*) |
+-----------+
| 203883150 |
+-----------+
Journeys are about ~70 steps, but can chain one another. We currently have one journey of 170 steps
mysql> SELECT journey_id, count(*) FROM journey_steps GROUP BY journey_id ORDER BY 2 DESC;
+------------+----------+
| journey_id | count(*) |
+------------+----------+
| 32 | 169 |
| 38 | 66 |
| 37 | 66 |
| 36 | 66 |
| 35 | 66 |
| 34 | 66 |
| 33 | 66 |
| 9 | 66 |
| 31 | 66 |
| 30 | 66 |
| 29 | 66 |
| 26 | 66 |
| 28 | 66 |
| 10 | 62 |
| 22 | 40 |
| 43 | 26 |
That's about 8~10GB of storage per week