horaedb
horaedb copied to clipboard
Support load data from a bunch of sst
Describe This Problem
For some data persisted as ssts in production environment, it is hoped that download the data and load it into a CeresDB instance for test and development.
Proposal
Load by write API
A simple way is to provide a tool to decode the ssts, and load the data into CeresDB by write API.
- Pros:
- Simple;
- No need to worry about the format compatibility, that is to say, we can load data files in not only CeresDB's format(Parquet);
- Cons:
- Loading by write may cost too much time;
Bulk load
Use CeresDB to read the ssts, and generates the corresponding meta data in manifest directly.
- Pros:
- The speed must be very fast;
- Cons:
- A little bit complex;
- Only can support load data files in CeresDB's file format;
Additional Context
No response
When troubleshoot query performance issues, the generated ssts by compaction matters most so I vote for the second way to load the sst directly.