DataX
DataX copied to clipboard
看任务执行日志, jobId 是不同的,是不同的datax同步进程。 是碰巧这几个任务的数据量比较一致?
看任务执行日志, jobId 是不同的,是不同的datax同步进程。 是碰巧这几个任务的数据量比较一致?
Originally posted by @TrafalgarLuo in https://github.com/alibaba/DataX/issues/1437#issuecomment-1179907115
目前看了,是统计日志输出有问题。在多线程并发情况下数据混乱了。完整日志如下。
目前我4张表,emp_c1是99999条、emp_c2是99998条、emp_c3是99997条、emp_c4是100000条,但是最终打印的结果如下
17:23:12.701 logback [job-68] INFO c.j.exchange.core.job.JobContainer -
[total cpu info] =>
averageCpu | maxDeltaCpu | minDeltaCpu
-1.00% | -1.00% | -1.00%
[total gc info] =>
NAME | totalGCCount | maxDeltaGCCount | minDeltaGCCount | totalGCTime | maxDeltaGCTime | minDeltaGCTime
PS MarkSweep | 2 | 2 | 2 | 0.079s | 0.079s | 0.079s
PS Scavenge | 20 | 20 | 20 | 0.152s | 0.152s | 0.152s
17:23:12.702 logback [job-68] INFO c.j.exchange.core.job.JobContainer - PerfTrace not enable! 17:23:12.702 logback [job-68] INFO c.j.e.c.s.c.c.j.StandAloneJobContainerCommunicator - Total 399994 records, 44590709 bytes | Speed 181.44KB/s, 1666 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 489.891s | All Task WaitReaderTime 167.094s | Percentage 400.00% 17:23:12.702 logback [job-68] INFO c.j.exchange.core.job.JobContainer - 任务启动时刻 : 2022-07-12 17:19:00 任务结束时刻 : 2022-07-12 17:23:12 任务总计耗时 : 252s 任务平均流量 : 181.44KB/s 记录写入速度 : 1666rec/s 读出记录总数 : 399994 读写失败总数 : 0
17:23:14.206 logback [job-69] INFO c.j.e.c.s.c.c.j.StandAloneJobContainerCommunicator - Total 399994 records, 44590709 bytes | Speed 140.29KB/s, 1287 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 489.891s | All Task WaitReaderTime 167.094s | Percentage 400.00%
17:23:14.206 logback [job-69] INFO c.j.e.c.j.s.AbstractScheduler - Scheduler accomplished all tasks.
17:23:14.206 logback [job-69] INFO c.j.exchange.core.job.JobContainer - engine Writer.Job [mysqlwriter] do post work.
17:23:14.207 logback [job-69] INFO c.j.exchange.core.job.JobContainer - engine Reader.Job [mysqlreader] do post work.
17:23:14.207 logback [job-69] INFO c.j.exchange.core.job.JobContainer - engine jobId [69] completed successfully.
17:23:14.207 logback [job-69] INFO c.j.exchange.core.util.HookInvoker - No hook invoked, because base dir not exists or is a file: D:\DSG\git_repo\exchange\damp-exchange-engine\damp-exchange-engine\target\engine\engine\hook
17:23:14.207 logback [job-69] INFO c.j.exchange.core.job.JobContainer -
[total cpu info] =>
averageCpu | maxDeltaCpu | minDeltaCpu
-1.00% | -1.00% | -1.00%
[total gc info] =>
NAME | totalGCCount | maxDeltaGCCount | minDeltaGCCount | totalGCTime | maxDeltaGCTime | minDeltaGCTime
PS MarkSweep | 2 | 2 | 0 | 0.079s | 0.079s | 0.000s
PS Scavenge | 20 | 20 | 0 | 0.152s | 0.152s | 0.000s
17:23:14.208 logback [job-69] INFO c.j.exchange.core.job.JobContainer - PerfTrace not enable! 17:23:14.208 logback [job-69] INFO c.j.e.c.s.c.c.j.StandAloneJobContainerCommunicator - Total 399994 records, 44590709 bytes | Speed 174.18KB/s, 1599 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 489.891s | All Task WaitReaderTime 167.094s | Percentage 400.00% 17:23:14.208 logback [job-69] INFO c.j.exchange.core.job.JobContainer - 任务启动时刻 : 2022-07-12 17:19:00 任务结束时刻 : 2022-07-12 17:23:14 任务总计耗时 : 253s 任务平均流量 : 174.18KB/s 记录写入速度 : 1599rec/s 读出记录总数 : 399994 读写失败总数 : 0
17:23:14.442 logback [job-67] INFO c.j.e.c.s.c.c.j.StandAloneJobContainerCommunicator - Total 399994 records, 44590709 bytes | Speed 235.96KB/s, 2166 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 489.891s | All Task WaitReaderTime 167.094s | Percentage 400.00%
17:23:14.442 logback [job-67] INFO c.j.e.c.j.s.AbstractScheduler - Scheduler accomplished all tasks.
17:23:14.442 logback [job-67] INFO c.j.exchange.core.job.JobContainer - engine Writer.Job [mysqlwriter] do post work.
17:23:14.442 logback [job-67] INFO c.j.exchange.core.job.JobContainer - engine Reader.Job [mysqlreader] do post work.
17:23:14.443 logback [job-67] INFO c.j.exchange.core.job.JobContainer - engine jobId [67] completed successfully.
17:23:14.443 logback [job-67] INFO c.j.exchange.core.util.HookInvoker - No hook invoked, because base dir not exists or is a file: D:\DSG\git_repo\exchange\damp-exchange-engine\damp-exchange-engine\target\engine\engine\hook
17:23:14.443 logback [job-67] INFO c.j.exchange.core.job.JobContainer -
[total cpu info] =>
averageCpu | maxDeltaCpu | minDeltaCpu
-1.00% | -1.00% | -1.00%
[total gc info] =>
NAME | totalGCCount | maxDeltaGCCount | minDeltaGCCount | totalGCTime | maxDeltaGCTime | minDeltaGCTime
PS MarkSweep | 2 | 2 | 0 | 0.079s | 0.079s | 0.000s
PS Scavenge | 20 | 20 | 0 | 0.152s | 0.152s | 0.000s
17:23:14.443 logback [job-67] INFO c.j.exchange.core.job.JobContainer - PerfTrace not enable! 17:23:14.443 logback [job-67] INFO c.j.e.c.s.c.c.j.StandAloneJobContainerCommunicator - Total 399994 records, 44590709 bytes | Speed 174.18KB/s, 1599 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 489.891s | All Task WaitReaderTime 167.094s | Percentage 400.00% 17:23:14.443 logback [job-67] INFO c.j.exchange.core.job.JobContainer - 任务启动时刻 : 2022-07-12 17:19:00 任务结束时刻 : 2022-07-12 17:23:14 任务总计耗时 : 253s 任务平均流量 : 174.18KB/s 记录写入速度 : 1599rec/s 读出记录总数 : 399994 读写失败总数 : 0
17:23:20.887 logback [job-66] INFO c.j.e.c.s.c.c.j.StandAloneJobContainerCommunicator - Total 399994 records, 44590709 bytes | Speed 79.78KB/s, 732 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 489.891s | All Task WaitReaderTime 167.094s | Percentage 400.00%
17:23:20.887 logback [job-66] INFO c.j.e.c.j.s.AbstractScheduler - Scheduler accomplished all tasks.
17:23:20.887 logback [job-66] INFO c.j.exchange.core.job.JobContainer - engine Writer.Job [mysqlwriter] do post work.
17:23:20.888 logback [job-66] INFO c.j.exchange.core.job.JobContainer - engine Reader.Job [mysqlreader] do post work.
17:23:20.888 logback [job-66] INFO c.j.exchange.core.job.JobContainer - engine jobId [66] completed successfully.
17:23:20.888 logback [job-66] INFO c.j.exchange.core.util.HookInvoker - No hook invoked, because base dir not exists or is a file: D:\DSG\git_repo\exchange\damp-exchange-engine\damp-exchange-engine\target\engine\engine\hook
17:23:20.888 logback [job-66] INFO c.j.exchange.core.job.JobContainer -
[total cpu info] =>
averageCpu | maxDeltaCpu | minDeltaCpu
-1.00% | -1.00% | -1.00%
[total gc info] =>
NAME | totalGCCount | maxDeltaGCCount | minDeltaGCCount | totalGCTime | maxDeltaGCTime | minDeltaGCTime
PS MarkSweep | 2 | 2 | 0 | 0.079s | 0.079s | 0.000s
PS Scavenge | 20 | 20 | 0 | 0.152s | 0.152s | 0.000s
17:23:20.888 logback [job-66] INFO c.j.exchange.core.job.JobContainer - PerfTrace not enable! 17:23:20.888 logback [job-66] INFO c.j.e.c.s.c.c.j.StandAloneJobContainerCommunicator - Total 399994 records, 44590709 bytes | Speed 174.18KB/s, 1599 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 489.891s | All Task WaitReaderTime 167.094s | Percentage 400.00% 17:23:20.889 logback [job-66] INFO c.j.exchange.core.job.JobContainer - 任务启动时刻 : 2022-07-12 17:19:00 任务结束时刻 : 2022-07-12 17:23:20 任务总计耗时 : 260s 任务平均流量 : 174.18KB/s 记录写入速度 : 1599rec/s 读出记录总数 : 399994 读写失败总数 : 0
我的json如下
{
"content":[
{
"reader":{
"name":"mysqlreader",
"parameter":{
"column":[
"emp_id
",
"emp_name
",
"gender
",
"account
",
"org_id
",
"birth_date
",
"age
",
"nationality
",
"province
",
"city
",
"email
",
"phone
",
"begin_date
",
"remark
",
"create_time
",
"update_time
"
],
"connection":[
{
"jdbcUrl":[
"jdbc:mysql://xxx:3306/xx?serverTimezone=Asia/Shanghai&characterEncoding=utf8&useSSL=false&autoReconnect=true"
],
"table":[
"emp_c3"
]
}
],
"password":"*",
"splitPk":"emp_id",
"username":"root"
}
},
"writer":{
"name":"mysqlwriter",
"parameter":{
"column":[
"emp_id
",
"emp_name
",
"gender
",
"account
",
"org_id
",
"birth_date
",
"age
",
"nationality
",
"province
",
"city
",
"email
",
"phone
",
"begin_date
",
"remark
",
"create_time
",
"update_time
"
],
"connection":[
{
"jdbcUrl":"jdbc:mysql://xxx:3306/xx?serverTimezone=Asia/Shanghai&characterEncoding=utf8&useSSL=false&autoReconnect=true",
"table":[
"emp_c3"
]
}
],
"password":"",
"username":"root"
}
}
}
],
"setting":{
"errorLimit":{
"percentage":0.02,
"record":0
},
"speed":{
"batchSize":4096,
"byte":904857600,
"channel":1
}
}
}
还望大佬验证下,确实有这个问题。
我也碰到了,请问解决了吗
不同datax进程应该没问题,你是一个datax进程?
大佬,我看你们任务平均流量是100多KB/S,那你们带宽是多少跑出的这个结果
大佬,我看你们任务平均流量是100多KB/S,那你们带宽是多少跑出的这个结果
这个100KB/S是由于源端表数据量大,还有就是很多程序都在抽数据
这个问题有办法解决吗?我也发现了。