DataX icon indicating copy to clipboard operation
DataX copied to clipboard

看任务执行日志, jobId 是不同的,是不同的datax同步进程。 是碰巧这几个任务的数据量比较一致?

Open lihjChina opened this issue 2 years ago • 7 comments

看任务执行日志, jobId 是不同的,是不同的datax同步进程。 是碰巧这几个任务的数据量比较一致?

Originally posted by @TrafalgarLuo in https://github.com/alibaba/DataX/issues/1437#issuecomment-1179907115

lihjChina avatar Jul 12 '22 09:07 lihjChina

目前看了,是统计日志输出有问题。在多线程并发情况下数据混乱了。完整日志如下。 image 目前我4张表,emp_c1是99999条、emp_c2是99998条、emp_c3是99997条、emp_c4是100000条,但是最终打印的结果如下 17:23:12.701 logback [job-68] INFO c.j.exchange.core.job.JobContainer - [total cpu info] => averageCpu | maxDeltaCpu | minDeltaCpu
-1.00% | -1.00% | -1.00%

 [total gc info] => 
	 NAME                 | totalGCCount       | maxDeltaGCCount    | minDeltaGCCount    | totalGCTime        | maxDeltaGCTime     | minDeltaGCTime     
	 PS MarkSweep         | 2                  | 2                  | 2                  | 0.079s             | 0.079s             | 0.079s             
	 PS Scavenge          | 20                 | 20                 | 20                 | 0.152s             | 0.152s             | 0.152s             

17:23:12.702 logback [job-68] INFO c.j.exchange.core.job.JobContainer - PerfTrace not enable! 17:23:12.702 logback [job-68] INFO c.j.e.c.s.c.c.j.StandAloneJobContainerCommunicator - Total 399994 records, 44590709 bytes | Speed 181.44KB/s, 1666 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 489.891s | All Task WaitReaderTime 167.094s | Percentage 400.00% 17:23:12.702 logback [job-68] INFO c.j.exchange.core.job.JobContainer - 任务启动时刻 : 2022-07-12 17:19:00 任务结束时刻 : 2022-07-12 17:23:12 任务总计耗时 : 252s 任务平均流量 : 181.44KB/s 记录写入速度 : 1666rec/s 读出记录总数 : 399994 读写失败总数 : 0

17:23:14.206 logback [job-69] INFO c.j.e.c.s.c.c.j.StandAloneJobContainerCommunicator - Total 399994 records, 44590709 bytes | Speed 140.29KB/s, 1287 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 489.891s | All Task WaitReaderTime 167.094s | Percentage 400.00% 17:23:14.206 logback [job-69] INFO c.j.e.c.j.s.AbstractScheduler - Scheduler accomplished all tasks. 17:23:14.206 logback [job-69] INFO c.j.exchange.core.job.JobContainer - engine Writer.Job [mysqlwriter] do post work. 17:23:14.207 logback [job-69] INFO c.j.exchange.core.job.JobContainer - engine Reader.Job [mysqlreader] do post work. 17:23:14.207 logback [job-69] INFO c.j.exchange.core.job.JobContainer - engine jobId [69] completed successfully. 17:23:14.207 logback [job-69] INFO c.j.exchange.core.util.HookInvoker - No hook invoked, because base dir not exists or is a file: D:\DSG\git_repo\exchange\damp-exchange-engine\damp-exchange-engine\target\engine\engine\hook 17:23:14.207 logback [job-69] INFO c.j.exchange.core.job.JobContainer - [total cpu info] => averageCpu | maxDeltaCpu | minDeltaCpu
-1.00% | -1.00% | -1.00%

 [total gc info] => 
	 NAME                 | totalGCCount       | maxDeltaGCCount    | minDeltaGCCount    | totalGCTime        | maxDeltaGCTime     | minDeltaGCTime     
	 PS MarkSweep         | 2                  | 2                  | 0                  | 0.079s             | 0.079s             | 0.000s             
	 PS Scavenge          | 20                 | 20                 | 0                  | 0.152s             | 0.152s             | 0.000s             

17:23:14.208 logback [job-69] INFO c.j.exchange.core.job.JobContainer - PerfTrace not enable! 17:23:14.208 logback [job-69] INFO c.j.e.c.s.c.c.j.StandAloneJobContainerCommunicator - Total 399994 records, 44590709 bytes | Speed 174.18KB/s, 1599 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 489.891s | All Task WaitReaderTime 167.094s | Percentage 400.00% 17:23:14.208 logback [job-69] INFO c.j.exchange.core.job.JobContainer - 任务启动时刻 : 2022-07-12 17:19:00 任务结束时刻 : 2022-07-12 17:23:14 任务总计耗时 : 253s 任务平均流量 : 174.18KB/s 记录写入速度 : 1599rec/s 读出记录总数 : 399994 读写失败总数 : 0

17:23:14.442 logback [job-67] INFO c.j.e.c.s.c.c.j.StandAloneJobContainerCommunicator - Total 399994 records, 44590709 bytes | Speed 235.96KB/s, 2166 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 489.891s | All Task WaitReaderTime 167.094s | Percentage 400.00% 17:23:14.442 logback [job-67] INFO c.j.e.c.j.s.AbstractScheduler - Scheduler accomplished all tasks. 17:23:14.442 logback [job-67] INFO c.j.exchange.core.job.JobContainer - engine Writer.Job [mysqlwriter] do post work. 17:23:14.442 logback [job-67] INFO c.j.exchange.core.job.JobContainer - engine Reader.Job [mysqlreader] do post work. 17:23:14.443 logback [job-67] INFO c.j.exchange.core.job.JobContainer - engine jobId [67] completed successfully. 17:23:14.443 logback [job-67] INFO c.j.exchange.core.util.HookInvoker - No hook invoked, because base dir not exists or is a file: D:\DSG\git_repo\exchange\damp-exchange-engine\damp-exchange-engine\target\engine\engine\hook 17:23:14.443 logback [job-67] INFO c.j.exchange.core.job.JobContainer - [total cpu info] => averageCpu | maxDeltaCpu | minDeltaCpu
-1.00% | -1.00% | -1.00%

 [total gc info] => 
	 NAME                 | totalGCCount       | maxDeltaGCCount    | minDeltaGCCount    | totalGCTime        | maxDeltaGCTime     | minDeltaGCTime     
	 PS MarkSweep         | 2                  | 2                  | 0                  | 0.079s             | 0.079s             | 0.000s             
	 PS Scavenge          | 20                 | 20                 | 0                  | 0.152s             | 0.152s             | 0.000s             

17:23:14.443 logback [job-67] INFO c.j.exchange.core.job.JobContainer - PerfTrace not enable! 17:23:14.443 logback [job-67] INFO c.j.e.c.s.c.c.j.StandAloneJobContainerCommunicator - Total 399994 records, 44590709 bytes | Speed 174.18KB/s, 1599 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 489.891s | All Task WaitReaderTime 167.094s | Percentage 400.00% 17:23:14.443 logback [job-67] INFO c.j.exchange.core.job.JobContainer - 任务启动时刻 : 2022-07-12 17:19:00 任务结束时刻 : 2022-07-12 17:23:14 任务总计耗时 : 253s 任务平均流量 : 174.18KB/s 记录写入速度 : 1599rec/s 读出记录总数 : 399994 读写失败总数 : 0

17:23:20.887 logback [job-66] INFO c.j.e.c.s.c.c.j.StandAloneJobContainerCommunicator - Total 399994 records, 44590709 bytes | Speed 79.78KB/s, 732 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 489.891s | All Task WaitReaderTime 167.094s | Percentage 400.00% 17:23:20.887 logback [job-66] INFO c.j.e.c.j.s.AbstractScheduler - Scheduler accomplished all tasks. 17:23:20.887 logback [job-66] INFO c.j.exchange.core.job.JobContainer - engine Writer.Job [mysqlwriter] do post work. 17:23:20.888 logback [job-66] INFO c.j.exchange.core.job.JobContainer - engine Reader.Job [mysqlreader] do post work. 17:23:20.888 logback [job-66] INFO c.j.exchange.core.job.JobContainer - engine jobId [66] completed successfully. 17:23:20.888 logback [job-66] INFO c.j.exchange.core.util.HookInvoker - No hook invoked, because base dir not exists or is a file: D:\DSG\git_repo\exchange\damp-exchange-engine\damp-exchange-engine\target\engine\engine\hook 17:23:20.888 logback [job-66] INFO c.j.exchange.core.job.JobContainer - [total cpu info] => averageCpu | maxDeltaCpu | minDeltaCpu
-1.00% | -1.00% | -1.00%

 [total gc info] => 
	 NAME                 | totalGCCount       | maxDeltaGCCount    | minDeltaGCCount    | totalGCTime        | maxDeltaGCTime     | minDeltaGCTime     
	 PS MarkSweep         | 2                  | 2                  | 0                  | 0.079s             | 0.079s             | 0.000s             
	 PS Scavenge          | 20                 | 20                 | 0                  | 0.152s             | 0.152s             | 0.000s             

17:23:20.888 logback [job-66] INFO c.j.exchange.core.job.JobContainer - PerfTrace not enable! 17:23:20.888 logback [job-66] INFO c.j.e.c.s.c.c.j.StandAloneJobContainerCommunicator - Total 399994 records, 44590709 bytes | Speed 174.18KB/s, 1599 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 489.891s | All Task WaitReaderTime 167.094s | Percentage 400.00% 17:23:20.889 logback [job-66] INFO c.j.exchange.core.job.JobContainer - 任务启动时刻 : 2022-07-12 17:19:00 任务结束时刻 : 2022-07-12 17:23:20 任务总计耗时 : 260s 任务平均流量 : 174.18KB/s 记录写入速度 : 1599rec/s 读出记录总数 : 399994 读写失败总数 : 0

我的json如下 { "content":[ { "reader":{ "name":"mysqlreader", "parameter":{ "column":[ "emp_id", "emp_name", "gender", "account", "org_id", "birth_date", "age", "nationality", "province", "city", "email", "phone", "begin_date", "remark", "create_time", "update_time" ], "connection":[ { "jdbcUrl":[ "jdbc:mysql://xxx:3306/xx?serverTimezone=Asia/Shanghai&characterEncoding=utf8&useSSL=false&autoReconnect=true" ], "table":[ "emp_c3" ] } ], "password":"*", "splitPk":"emp_id", "username":"root" } }, "writer":{ "name":"mysqlwriter", "parameter":{ "column":[ "emp_id", "emp_name", "gender", "account", "org_id", "birth_date", "age", "nationality", "province", "city", "email", "phone", "begin_date", "remark", "create_time", "update_time" ], "connection":[ { "jdbcUrl":"jdbc:mysql://xxx:3306/xx?serverTimezone=Asia/Shanghai&characterEncoding=utf8&useSSL=false&autoReconnect=true", "table":[ "emp_c3" ] } ], "password":"", "username":"root" } } } ], "setting":{ "errorLimit":{ "percentage":0.02, "record":0 }, "speed":{ "batchSize":4096, "byte":904857600, "channel":1 } } }

lihjChina avatar Jul 12 '22 09:07 lihjChina

还望大佬验证下,确实有这个问题。

lihjChina avatar Jul 12 '22 09:07 lihjChina

我也碰到了,请问解决了吗

xj-black avatar Aug 01 '22 05:08 xj-black

不同datax进程应该没问题,你是一个datax进程?

adonis2014 avatar Aug 29 '22 05:08 adonis2014

大佬,我看你们任务平均流量是100多KB/S,那你们带宽是多少跑出的这个结果

YaoJian001 avatar Dec 07 '22 09:12 YaoJian001

大佬,我看你们任务平均流量是100多KB/S,那你们带宽是多少跑出的这个结果

这个100KB/S是由于源端表数据量大,还有就是很多程序都在抽数据

lihjChina avatar Feb 15 '23 09:02 lihjChina

这个问题有办法解决吗?我也发现了。

leafCheng1226 avatar Apr 12 '24 07:04 leafCheng1226