matrixone
matrixone copied to clipboard
[Bug]: the perf is much lower than mysql for batch insert of tpcc data
Is there an existing issue for the same bug?
- [X] I have checked the existing issues.
Environment
- Version or commit-id (e.g. v0.1.0 or 8b23a93):0.6.0
- Hardware parameters:
- OS type:
- Others:
Actual Behavior
run load data of 1 warehouse to mo through benchbase tools, but the cost is very different.
mysql is about 80s,but mo is more than 1 hour
Expected Behavior
No response
Steps to Reproduce
No response
Additional information
No response
provide some cases
first, profiling it ~ @jianwan0214
Still working on it.
Have finished profiling with the help of @jianwan0214 as

so what's the action or suggestion next? @YANGGMM @jianwan0214
Please ping @LeftHandCold to see if this is the logging issue. @xzxiong may have fixed it.
Please ping @LeftHandCold to see if this is the logging issue. @xzxiong may have fixed it.
ok
brief:
- after running load 2 min, gc cost much
- after running laod 12 min, txnbase cost much
dev: dev1
run: ./mo-service -debug-http 127.0.0.1:8123 -allocs-profile ./allocs.profile -launch ./etc/launch-tae-logservice/launch.toml
test case:
- clone https://github.com/aressu1985/mo-tpcc
- startup mo
- init schema
mysql> create database tpcc;
mysql> use tpcc;
mysql> source /path/to/mo-tpcc/sql/tableCreates_pk.sql
- exec load
$ cd /path/to/mo-tpcc
$ mkdir data
$ sh ./runLoader.sh props.mo warehouse 1
- gen cpu profile
PPROF_TMPDIR="./tmpdir" go tool pprof http://localhost:8123/debug/pprof/profile
- after running load 2 minutes pprof.mo-service.samples.cpu.001.pb.gz
- after running 12 minutes pprof.mo-service.samples.cpu.002.pb.gz
load run 26 min
[2022-11-09 22:43:16] init load
mkdir: cannot create directory 'data': File exists
Starting BenchmarkSQL LoadData
props.mo
driver=com.mysql.cj.jdbc.Driver
conn=jdbc:mysql://127.0.0.1:6001/tpcc?characterSetResults=utf8&continueBatchOnError=false&useServerPrepStmts=true&alwaysSendSetIsolation=false&useLocalSessionState=true&zeroDateTimeBehavior=CONVERT_TO_NULL&failoverReadOnly=false&serverTimezone=Asia/Shanghai&enabledTLSProtocols=TLSv1.2&useSSL=false
user=dump
password=***********
warehouses=1
loadWorkers=4
fileLocation (not defined)
csvNullValue (not defined - using default '')
Worker 000: Loading ITEM
Worker 001: Loading Warehouse 1
Worker 000: Loading ITEM done
Worker 001: Loading Warehouse 1 done
[2022-11-09 23:09:28] done load
@LeftHandCold plz have a look.
I will debug this issue in the next two days
@LeftHandCold How's it going?
still working on it
tae-logservice and CN-DN are in different path. CN-DN doesn't collect deletes in that way.
startup mo with CN-DN

In CN-DN, memtable.(*Version[...]).Visible cost much

brief:
- after running load 2 min, gc cost much
- after running laod 12 min, txnbase cost much
dev: dev1 run:
./mo-service -debug-http 127.0.0.1:8123 -allocs-profile ./allocs.profile -launch ./etc/launch-tae-logservice/launch.tomltest case:
- clone https://github.com/aressu1985/mo-tpcc
- startup mo
- init schema
mysql> create database tpcc; mysql> use tpcc; mysql> source /path/to/mo-tpcc/sql/tableCreates_pk.sql
- exec load
$ cd /path/to/mo-tpcc $ mkdir data $ sh ./runLoader.sh props.mo warehouse 1
- gen cpu profile
PPROF_TMPDIR="./tmpdir" go tool pprof http://localhost:8123/debug/pprof/profile
- after running load 2 minutes pprof.mo-service.samples.cpu.001.pb.gz
![]()
- after running 12 minutes pprof.mo-service.samples.cpu.002.pb.gz
![]()
We're investigating CN-DN deployment and what you post is a standalone deployment.
I did the following:
start mo-service with:
./mo-service -launch etc/launch-tae-CN-tae-DN/launch.toml -debug-http :9876 2>&1 | tee -i log.txt
load schema with:
create database tpcc;
use tpcc;
source /Users/reus/matrixorigin/mo-tpcc/sql/tableCreates.sql;
load data with:
./runLoader.sh props.mo warehouse 1
during the load, fetch the CPU profile with:
curl 'localhost:9876/debug/pprof/profile?seconds=120' > prof
open the profile with:
go tool pprof -http :6060 prof
I'll try to replace the old mem table with the new implementation in the CN partition cache.
in progress
in progress
in progress
in progress
in progress
on sick leave.
on sick leave.
in progress
in progress
insert with auto is too heavy, need to find ways to reduce overhead.
We believe https://github.com/matrixorigin/matrixone/issues/6956 should resolve this issue.
about this part, my work is done with https://github.com/matrixorigin/matrixone/pull/7532