horaedb icon indicating copy to clipboard operation
horaedb copied to clipboard

Support load data from a bunch of sst

Open ShiKaiWi opened this issue 2 years ago • 1 comments

Describe This Problem

For some data persisted as ssts in production environment, it is hoped that download the data and load it into a CeresDB instance for test and development.

Proposal

Load by write API

A simple way is to provide a tool to decode the ssts, and load the data into CeresDB by write API.

  • Pros:
    • Simple;
    • No need to worry about the format compatibility, that is to say, we can load data files in not only CeresDB's format(Parquet);
  • Cons:
    • Loading by write may cost too much time;

Bulk load

Use CeresDB to read the ssts, and generates the corresponding meta data in manifest directly.

  • Pros:
    • The speed must be very fast;
  • Cons:
    • A little bit complex;
    • Only can support load data files in CeresDB's file format;

Additional Context

No response

ShiKaiWi avatar May 24 '23 07:05 ShiKaiWi

When troubleshoot query performance issues, the generated ssts by compaction matters most so I vote for the second way to load the sst directly.

ShiKaiWi avatar May 25 '23 02:05 ShiKaiWi