matrixone icon indicating copy to clipboard operation
matrixone copied to clipboard

[Bug]: the perf is much lower than mysql for batch insert of tpcc data

Open aressu1985 opened this issue 3 years ago • 1 comments
trafficstars

Is there an existing issue for the same bug?

  • [X] I have checked the existing issues.

Environment

- Version or commit-id (e.g. v0.1.0 or 8b23a93):0.6.0
- Hardware parameters:
- OS type:
- Others:

Actual Behavior

run load data of 1 warehouse to mo through benchbase tools, but the cost is very different.

mysql is about 80s,but mo is more than 1 hour

Expected Behavior

No response

Steps to Reproduce

No response

Additional information

No response

aressu1985 avatar Oct 11 '22 06:10 aressu1985

provide some cases

daviszhen avatar Oct 11 '22 06:10 daviszhen

first, profiling it ~ @jianwan0214

florashi181 avatar Nov 04 '22 03:11 florashi181

Still working on it.

YANGGMM avatar Nov 08 '22 01:11 YANGGMM

Have finished profiling with the help of @jianwan0214 as
profile

YANGGMM avatar Nov 08 '22 11:11 YANGGMM

so what's the action or suggestion next? @YANGGMM @jianwan0214

florashi181 avatar Nov 08 '22 13:11 florashi181

Please ping @LeftHandCold to see if this is the logging issue. @xzxiong may have fixed it.

fengttt avatar Nov 09 '22 05:11 fengttt

Please ping @LeftHandCold to see if this is the logging issue. @xzxiong may have fixed it.

ok

YANGGMM avatar Nov 09 '22 06:11 YANGGMM

brief:

  1. after running load 2 min, gc cost much
  2. after running laod 12 min, txnbase cost much

dev: dev1 run: ./mo-service -debug-http 127.0.0.1:8123 -allocs-profile ./allocs.profile -launch ./etc/launch-tae-logservice/launch.toml test case:

  1. clone https://github.com/aressu1985/mo-tpcc
  2. startup mo
  3. init schema
mysql> create database tpcc;
mysql>  use tpcc;
mysql> source /path/to/mo-tpcc/sql/tableCreates_pk.sql
  1. exec load
$ cd /path/to/mo-tpcc
$ mkdir data
$ sh ./runLoader.sh props.mo warehouse 1
  1. gen cpu profile PPROF_TMPDIR="./tmpdir" go tool pprof http://localhost:8123/debug/pprof/profile
image

profile001

image

profile002

xzxiong avatar Nov 09 '22 15:11 xzxiong

load run 26 min

[2022-11-09 22:43:16] init load
mkdir: cannot create directory 'data': File exists
Starting BenchmarkSQL LoadData

props.mo
driver=com.mysql.cj.jdbc.Driver
conn=jdbc:mysql://127.0.0.1:6001/tpcc?characterSetResults=utf8&continueBatchOnError=false&useServerPrepStmts=true&alwaysSendSetIsolation=false&useLocalSessionState=true&zeroDateTimeBehavior=CONVERT_TO_NULL&failoverReadOnly=false&serverTimezone=Asia/Shanghai&enabledTLSProtocols=TLSv1.2&useSSL=false
user=dump
password=***********
warehouses=1
loadWorkers=4
fileLocation (not defined)
csvNullValue (not defined - using default '')

Worker 000: Loading ITEM
Worker 001: Loading Warehouse      1
Worker 000: Loading ITEM done
Worker 001: Loading Warehouse      1 done
[2022-11-09 23:09:28] done load

xzxiong avatar Nov 09 '22 15:11 xzxiong

@LeftHandCold plz have a look.

YANGGMM avatar Nov 11 '22 08:11 YANGGMM

I will debug this issue in the next two days

LeftHandCold avatar Nov 11 '22 11:11 LeftHandCold

@LeftHandCold How's it going?

YANGGMM avatar Nov 17 '22 06:11 YANGGMM

still working on it

YANGGMM avatar Dec 18 '22 09:12 YANGGMM

tae-logservice and CN-DN are in different path. CN-DN doesn't collect deletes in that way.

startup mo with CN-DN profile005

jiangxinmeng1 avatar Dec 20 '22 04:12 jiangxinmeng1

In CN-DN, memtable.(*Version[...]).Visible cost much 微信图片_20220409181135

jiangxinmeng1 avatar Dec 20 '22 05:12 jiangxinmeng1

brief:

  1. after running load 2 min, gc cost much
  2. after running laod 12 min, txnbase cost much

dev: dev1 run: ./mo-service -debug-http 127.0.0.1:8123 -allocs-profile ./allocs.profile -launch ./etc/launch-tae-logservice/launch.toml test case:

  1. clone https://github.com/aressu1985/mo-tpcc
  2. startup mo
  3. init schema
mysql> create database tpcc;
mysql>  use tpcc;
mysql> source /path/to/mo-tpcc/sql/tableCreates_pk.sql
  1. exec load
$ cd /path/to/mo-tpcc
$ mkdir data
$ sh ./runLoader.sh props.mo warehouse 1
  1. gen cpu profile PPROF_TMPDIR="./tmpdir" go tool pprof http://localhost:8123/debug/pprof/profile
image

profile001

image

profile002

We're investigating CN-DN deployment and what you post is a standalone deployment.

XuPeng-SH avatar Dec 20 '22 05:12 XuPeng-SH

I did the following:

start mo-service with:

./mo-service -launch etc/launch-tae-CN-tae-DN/launch.toml -debug-http :9876 2>&1 | tee -i log.txt

load schema with:

create database tpcc;
use tpcc;
source /Users/reus/matrixorigin/mo-tpcc/sql/tableCreates.sql;

load data with:

./runLoader.sh props.mo warehouse 1

during the load, fetch the CPU profile with:

 curl 'localhost:9876/debug/pprof/profile?seconds=120' > prof

open the profile with:

go tool pprof -http :6060 prof

reusee avatar Dec 20 '22 07:12 reusee

I'll try to replace the old mem table with the new implementation in the CN partition cache.

reusee avatar Dec 20 '22 07:12 reusee

in progress

reusee avatar Dec 21 '22 13:12 reusee

in progress

reusee avatar Dec 22 '22 12:12 reusee

in progress

reusee avatar Dec 23 '22 12:12 reusee

in progress

reusee avatar Dec 25 '22 13:12 reusee

in progress

reusee avatar Dec 26 '22 12:12 reusee

on sick leave.

reusee avatar Dec 27 '22 14:12 reusee

on sick leave.

reusee avatar Dec 28 '22 15:12 reusee

in progress

reusee avatar Dec 29 '22 15:12 reusee

in progress

reusee avatar Dec 30 '22 16:12 reusee

insert with auto is too heavy, need to find ways to reduce overhead.

nnsgmsone avatar Jan 04 '23 07:01 nnsgmsone

We believe https://github.com/matrixorigin/matrixone/issues/6956 should resolve this issue.

fengttt avatar Jan 11 '23 23:01 fengttt

about this part, my work is done with https://github.com/matrixorigin/matrixone/pull/7532

jianwan0214 avatar Jan 12 '23 01:01 jianwan0214