Maria DB data offloading
As a follow-up to https://github.com/jsdelivr/globalping/issues/269#issuecomment-2564143624 we may want to offload data older than 1 hour from Redis to another storage so that we can keep the measurements results stored for longer. S3 and similar alternatives would be very expensive because of request based pricing, so Maria DB seems like the best option. All queries would be single-row lookups by primary key, so a server with 16 - 32 GB RAM and 2 TB+ of fast storage should be sufficient.
How would purging work? Also can we do different TTL per user type?
A CRON job that checks the disk usage and drops part of the oldest data if it's above a threshold seems most reliable. Theoretically, we don't need to define storage duration and it just stores as much as possible.
If we wanted to have different TTL per user type, we'd combine this with predetermined expiration (again enforced by a scheduled function), and the disk check job would become just a fallback.
As a rough estimate, 1 month of data at the current usage level = 2 TB (Maria's compression might reduce that somewhat).
Feels icky doing it with cron. Wouldn't it also lock the tables and create issues when it runs?
Partitioning by days or weeks solves the delete problem, as dropping partitions is fast (unlike DELETE ... WHERE `date` < x which needs to scan the rows). The API will also have enough time to retry if needed, I imagine we'd try to move the data ASAP after the measurement is finished, but serve the data from Redis for 1 hour anyway.
Assigning as part of #291 as with the increased number of tests, we won't be able to hold them in memory for more than a few hours, and storing the raw results directly in the same DB as timeseries data is not suitable either. The current idea is to go with a separate Postgres instance, where:
- regular measurements are moved when finished, and dropped from redis after 1 hour,
- TS measurements are written when finished, and bypass redis completely as in-progress access is not needed.