tidb-lightning icon indicating copy to clipboard operation
tidb-lightning copied to clipboard

lightning import failed when tikv net delay 100ms

Open King-Dylan opened this issue 4 years ago • 8 comments

Bug Report

Please answer these questions before submitting your issue. Thanks!

  1. What did you do? we simulate the network delay of two places and three centers in remote computer rooms(100ms),and import csv file use lightning tools(importer mode)
  2. What did you expect to see?

21599112696_ pic_hd

  1. What did you see instead?

  2. What version of TiDB are you using (tidb-server -V or run select tidb_version(); on TiDB)?

  3. which tool are you using?

  4. what versionof tool are you using (pump -V or tidb-lightning -V or syncer -V)?

King-Dylan avatar Sep 03 '20 06:09 King-Dylan

21599112696_ pic_hd

King-Dylan avatar Sep 03 '20 06:09 King-Dylan

Hi, lightning's repo is https://github.com/pingcap/tidb-lightning/

lance6716 avatar Sep 03 '20 06:09 lance6716

@King-Dylan @lance6716 please let us transfer the issue rather than close and reopen a new one.

kennytm avatar Sep 03 '20 06:09 kennytm

Would this design be better When the majority copy is successfully imported,then lightning return success prevent the remote center network from affecting the import speed。

King-Dylan avatar Sep 03 '20 06:09 King-Dylan

according to the other logs the error occurred because upload timed out (no response with 30 seconds, it seems), and it has failed consecutively for 5 times.

kennytm avatar Sep 03 '20 06:09 kennytm

An internal test shows that, with 100ms latency, some of the ranges may cost up to 40 secs to be uploaded.

This may cause the importer backend failed because it set a 30s timeout for upload RPC. (As the gRPC site said, timeout is the longest time an RPC can be alive, but no more detailed docs found.) And with the latency grows, the time cost may NEVER less than 30s, and finally, it exceeds all retry times.

For now, use local backend can probably resolve this.

But the key problem is why latency slow down uploading, and what we can we do for it?

YuJuncen avatar Sep 04 '20 10:09 YuJuncen

An experiment shows that, with RTT growing, the throughput of raw TCP would be limited. when RTT doubles, throughput would become half.

Detailed info:
./test-result/ping-0ms.log :
[  4]   0.00-10.00  sec  9.92 GBytes  8.52 Gbits/sec   47             sender
[  4]   0.00-10.00  sec  9.91 GBytes  8.52 Gbits/sec                  receiver

./test-result/ping-100ms.log :
[  4]   0.00-10.00  sec   268 MBytes   225 Mbits/sec    6             sender
[  4]   0.00-10.00  sec   267 MBytes   224 Mbits/sec                  receiver

./test-result/ping-200ms.log :
[  4]   0.00-10.00  sec   123 MBytes   103 Mbits/sec   15             sender
[  4]   0.00-10.00  sec   123 MBytes   103 Mbits/sec                  receiver

./test-result/ping-400ms.log :
[  4]   0.00-10.00  sec  50.5 MBytes  42.3 Mbits/sec   14             sender
[  4]   0.00-10.00  sec  49.7 MBytes  41.7 Mbits/sec                  receiver

./test-result/ping-800ms.log :
[  4]   0.00-10.00  sec  13.7 MBytes  11.5 Mbits/sec    0             sender
[  4]   0.00-10.00  sec  13.7 MBytes  11.5 Mbits/sec                  receiver

YuJuncen avatar Sep 04 '20 10:09 YuJuncen

Since #400 is merged, I think this issue can be closed? @YuJuncen

glorv avatar Nov 06 '20 10:11 glorv