go-tpc icon indicating copy to clipboard operation
go-tpc copied to clipboard

Design a fast TPCC test data generation tool: Generate TPCC SST data, then use br to complete a quick import

Open kennytm opened this issue 5 years ago • 3 comments

Feature Request

Describe your feature request related problem:

We do not have a simple tool to generate large-scale example archives. For large-scale tests, we need to use dbgen to produce SQL dump and then use TiDB Lightning to import into the cluster. This is very time consuming — for 10T-scale test we need almost 2 days for this preparation step.

Describe the feature you'd like:

We should be able to directly generate the backup archive (create SSTs directly and populate the corresponding backupmeta).

Either we create a dedicated tool (focusing on a few selected schemas, e.g. sysbench or TPC-C), or extend dbgen to create SSTs (hard, since dbgen is schema-less and won't generate indices).

Describe alternatives you've considered:

Teachability, Documentation, Adoption, Migration Strategy:

kennytm avatar Feb 12 '20 07:02 kennytm

The step can be:

  1. Generate data
  2. convert to KV pairs via TiDB encoder (maybe hard coded )
  3. sort the index
  4. write out SSTs

We can use gorocksdb to write out SSTs. But in order to usingbr to restore it, we also need generate a backupmeta protobuf file

zhouqiang-cl avatar Mar 12 '20 09:03 zhouqiang-cl

This should better be transferred to https://github.com/pingcap/go-tpc/ (but i've no permission 🙃)

kennytm avatar May 28 '20 18:05 kennytm

I have add you @kennytm !!! It is your show time now!!!

zhouqiang-cl avatar May 30 '20 11:05 zhouqiang-cl