tidb-lightning icon indicating copy to clipboard operation
tidb-lightning copied to clipboard

【特性需求】可以导入外部生成好的sst文件

Open 3rduncle opened this issue 5 years ago • 1 comments

背景 tidb在我们这里作为用户画像的存储引擎,用户画像每天都会在凌晨在spark上进行计算,计算的结果需要在早高峰来到之前全部写入tidb。 数据量比较大,一份画像约有3亿行,几百列,100多G的数据

因为,考虑这样的解决方案

  1. 数据先在spark上报整理好排好序,生产tidb能识别的sst文件格式,利用外部算力把大部分的计算工作完成
  2. 通过接口把sst文件导入tidb中,tidb只需要做剩下的小部分的工作

3rduncle avatar Jan 17 '20 06:01 3rduncle

Hi,

  • If the Spark cluster is able to generate TiDB-specific SST files, it would be much easier to use BR to restore the SSTs rather than TiDB Lightning.
  • Furthermore, since you are using Spark, perhaps you could consider using TiSpark for the AP work, instead of manually doing the SST conversion work.

kennytm avatar Jan 30 '20 07:01 kennytm