mymysql icon indicating copy to clipboard operation
mymysql copied to clipboard

suspended in linux, but works in windows

Open vanillar7 opened this issue 10 years ago • 14 comments

the same code, work well in windows, but suspended in linux. my code is very similar like this: func DeleteData(a, b string) (err error) { sentence := "DELETE FROM tableA WHERE tableA.columnA IN ('" + a + "','" + b + "') " db := mysql.New("tcp", "", host, username, password, dbname) err = db.Connect() if err != nil { return errors.New(fmt.Sprintf("db.connect error:%v", err)) } defer db.Close()

_, err = db.Start(sentence)
if err != nil {
    return errors.New(fmt.Sprintf("db.start error:%v", err))
}

return err

}

unit testing(the max connections of mysql is 500): func TestDeleteData(t *testing.T) { count := 450 var wg sync.WaitGroup wg.Add(count) for i := 0; i < count; i++ { go func() { fmt.Println("start") for i := 0; i < 100; i++ { err := DeleteData("test1", "test2") if err != nil { t.Errorf("DeleteApnsUserTaskInBulk error:%s", err.Error()) } } fmt.Println("one over") wg.Done() }() } wg.Wait()

fmt.Println("finish...")

}

i run it in windows, the program print "finish..." and quit. but it suspended in linux, and quit with the log "*** Test killed with quit: ran too long (10m0s)" . part of output : goroutine 443 [IO wait]: net.runtime_pollWait(0x7f0c51f4f258, 0x72, 0x0) /usr/local/go/src/pkg/runtime/netpoll.goc:116 +0x6a net.(_pollDesc).Wait(0xc21026bca0, 0x72, 0x7f0c51f2f2c0, 0xb) /usr/local/go/src/pkg/net/fd_poll_runtime.go:81 +0x34 net.(_pollDesc).WaitRead(0xc21026bca0, 0xb, 0x7f0c51f2f2c0) /usr/local/go/src/pkg/net/fd_poll_runtime.go:86 +0x30 net.(_netFD).Read(0xc21026bc40, 0xc210519000, 0x1000, 0x1000, 0x0, ...) /usr/local/go/src/pkg/net/fd_unix.go:204 +0x2a0 net.(_conn).Read(0xc210479aa0, 0xc210519000, 0x1000, 0x1000, 0xe5, ...) /usr/local/go/src/pkg/net/net.go:122 +0xc5 bufio.(_Reader).fill(0xc2102c1f00) /usr/local/go/src/pkg/bufio/bufio.go:91 +0x110 bufio.(_Reader).Read(0xc2102c1f00, 0xc210243bf1, 0x3, 0x3, 0x3, ...) /usr/local/go/src/pkg/bufio/bufio.go:159 +0x1a4 github.com/ziutek/mymysql/native.(_pktReader).readHeader(0xc210243bd0) /root/boshi/projects/GoService/src/github.com/ziutek/mymysql/native/packet.go:26 +0x91 github.com/ziutek/mymysql/native.(_pktReader).readByte(0xc210243bd0, 0xc210243bd0) /root/boshi/projects/GoService/src/github.com/ziutek/mymysql/native/packet.go:78 +0x78 github.com/ziutek/mymysql/native.(_Conn).init(0xc21026dc60) /root/boshi/projects/GoService/src/github.com/ziutek/mymysql/native/init.go:15 +0x14a github.com/ziutek/mymysql/native.(_Conn).connect(0xc21026dc60, 0x0, 0x0) /root/boshi/projects/GoService/src/github.com/ziutek/mymysql/native/mysql.go:200 +x2a1 github.com/ziutek/mymysql/native.(*Conn).Connect(0xc21026dc60, 0x0, 0x0) /root/boshi/projects/GoService/src/github.com/ziutek/mymysql/native/mysql.go:253 +0x92

vanillar7 avatar Apr 29 '14 11:04 vanillar7

It seems that db.Connect hangs at reading first packet from your server (at reading first few bytes (header) of this packet). Did you check logs of your server? Did yuo try to limit number of concurrent connections?

ziutek avatar Apr 30 '14 08:04 ziutek

i don't know how to limit the number of concurrent without connection pool. there is no connection from the unit test when it is suspended.

vanillar7 avatar May 04 '14 07:05 vanillar7

Can you simply set count = count / 10 in your example code?

And what MySQL server says in its logs? It seems that it doesn't reply for new connections.

ziutek avatar May 05 '14 09:05 ziutek

i think i have a similar problem and building with -race narrows it down to shared skipBuf, which multiple goroutines use in parallel without locking. even if connection is never shared between goroutines. data race also exists with thrsafe engine.

edit: similar issue is described at http://blog.golang.org/race-detector (example 2)

anton-povarov avatar Jun 03 '14 12:06 anton-povarov

tried fixing it the naive way, like the go team did - doesn't help with the infinite connection wait, but race goes away.

anton-povarov avatar Jun 03 '14 13:06 anton-povarov

Can you provide some example code that can reproduce this problem?

ziutek avatar Jun 03 '14 14:06 ziutek

First, sorry to hijack your thread @vanillar7 !

the minimal example i was able to come up with: https://gist.github.com/anton-povarov/1933714a4dcec96bc0ff

steps in the code

  1. create X goroutines, intended to consume work items from channel C and respond with results on some other channel - R
  2. create another goroutine to send them tasks Y times
  3. watch how much stuff we've sent on C and received back on R every second

it's a bit unstable, but fails most of the time on my machine, connecting to the database in local network (i'm on wifi tho, with ping times of about 2-3ms).

typical fail log is like: go run test/mymysql.go 2014/06/03 19:40:10 done 318 sends, got 218 results 2014/06/03 19:40:11 done 534 sends, got 434 results 2014/06/03 19:40:12 done 772 sends, got 672 results 2014/06/03 19:40:13 done 1009 sends, got 909 results 2014/06/03 19:40:14 done 1268 sends, got 1168 results 2014/06/03 19:40:15 done 1555 sends, got 1455 results 2014/06/03 19:40:16 done 1858 sends, got 1758 results 2014/06/03 19:40:17 done 2180 sends, got 2080 results 2014/06/03 19:40:18 done 2482 sends, got 2382 results 2014/06/03 19:40:19 done 2796 sends, got 2696 results 2014/06/03 19:40:20 done 3000 sends, got 2985 results 2014/06/03 19:40:21 done 3000 sends, got 2991 results 2014/06/03 19:40:22 done 3000 sends, got 2991 results 2014/06/03 19:40:23 done 3000 sends, got 2991 results 2014/06/03 19:40:24 done 3000 sends, got 2991 results 2014/06/03 19:40:25 done 3000 sends, got 2991 results 2014/06/03 19:40:26 done 3000 sends, got 2991 results ^Cpanic: show me the goroutines bla-bla... goroutines here

anton-povarov avatar Jun 03 '14 15:06 anton-povarov

oh, and here's the naive patch i've applied to make the race go away, just in case. https://gist.github.com/anton-povarov/eb7d4a01af4163487c1a

anton-povarov avatar Jun 03 '14 15:06 anton-povarov

any news, guys ?

anton-povarov avatar Jun 05 '14 08:06 anton-povarov

I fixed missed error checking in your example and after that tested it several times on:

1 . The same host that MySQL server is running:

$ go run parallel.go 2014/06/05 11:14:41 done 2351 sends, got 2329 results 2014/06/05 11:14:41 got all 3000 results

In this case your test always finished.

2 . Different host:

$ go run parallel.go 2014/06/05 11:42:03 done 2008 sends, got 1908 results 2014/06/05 11:42:04 done 3000 sends, got 2997 results 2014/06/05 11:42:05 done 3000 sends, got 2997 results 2014/06/05 11:42:06 done 3000 sends, got 2999 results 2014/06/05 11:42:07 done 3000 sends, got 2999 results 2014/06/05 11:42:07 got all 3000 results

$ go run parallel.go 2014/06/05 11:46:42 done 958 sends, got 858 results 2014/06/05 11:46:43 done 2491 sends, got 2391 results 2014/06/05 11:46:43 dial tcp 10.128.101.7:3306: i/o timeout exit status 1

$ go run parallel.go 2014/06/05 11:48:37 done 2511 sends, got 2411 results 2014/06/05 11:48:38 done 3000 sends, got 2994 results 2014/06/05 11:48:39 done 3000 sends, got 2994 results 2014/06/05 11:48:40 done 3000 sends, got 2994 results 2014/06/05 11:48:41 done 3000 sends, got 2994 results 2014/06/05 11:48:42 done 3000 sends, got 2994 results 2014/06/05 11:48:43 done 3000 sends, got 2994 results 2014/06/05 11:48:44 done 3000 sends, got 2994 results 2014/06/05 11:48:45 done 3000 sends, got 2994 results 2014/06/05 11:48:46 done 3000 sends, got 2994 results 2014/06/05 11:48:47 done 3000 sends, got 2994 results 2014/06/05 11:48:48 done 3000 sends, got 2994 results 2014/06/05 11:48:49 done 3000 sends, got 2994 results 2014/06/05 11:48:50 done 3000 sends, got 2994 results ...

We are interested probably in last case from 2.

After Ctrl-C I can see that some gorutines hangs during reading first packet from MySQL serever on net.runtime_pollWait:

goroutine 93 [IO wait]: net.runtime_pollWait(0x7f076134a640, 0x72, 0x0) /home/michal/P/go/goroot/src/pkg/runtime/netpoll.goc:116 +0x6a net.(_pollDesc).Wait(0xc2100a4990, 0x72, 0x7f0761346f88, 0xb) /home/michal/P/go/goroot/src/pkg/net/fd_poll_runtime.go:81 +0x34 net.(_pollDesc).WaitRead(0xc2100a4990, 0xb, 0x7f0761346f88) /home/michal/P/go/goroot/src/pkg/net/fd_poll_runtime.go:86 +0x30 net.(_netFD).Read(0xc2100a4930, 0xc210177000, 0x1000, 0x1000, 0x0, ...) /home/michal/P/go/goroot/src/pkg/net/fd_unix.go:204 +0x2a0 net.(_conn).Read(0xc210000de0, 0xc210177000, 0x1000, 0x1000, 0x87, ...) /home/michal/P/go/goroot/src/pkg/net/net.go:122 +0xc5 bufio.(_Reader).fill(0xc21014f780) /home/michal/P/go/goroot/src/pkg/bufio/bufio.go:91 +0x110 bufio.(_Reader).Read(0xc21014f780, 0xc210084651, 0x3, 0x3, 0x3, ...) /home/michal/P/go/goroot/src/pkg/bufio/bufio.go:159 +0x1a4 github.com/ziutek/mymysql/native.(_pktReader).readHeader(0xc210084630) /home/michal/P/go/src/github.com/ziutek/mymysql/native/packet.go:26 +0x91 github.com/ziutek/mymysql/native.(_pktReader).readByte(0xc210084630, 0xc210084630) /home/michal/P/go/src/github.com/ziutek/mymysql/native/packet.go:78 +0x78 github.com/ziutek/mymysql/native.(_Conn).init(0xc21008e5a0) /home/michal/P/go/src/github.com/ziutek/mymysql/native/init.go:15 +0x14a github.com/ziutek/mymysql/native.(_Conn).connect(0xc21008e5a0, 0x0, 0x0) /home/michal/P/go/src/github.com/ziutek/mymysql/native/mysql.go:200 +0x2a1 github.com/ziutek/mymysql/native.(*Conn).Reconnect(0xc21008e5a0, 0x0, 0x0) /home/michal/P/go/src/github.com/ziutek/mymysql/native/mysql.go:294 +0x62 main.func·001() /home/michal/P/go/src/github.com/ziutek/mymysql/examples/parallel.go:48 +0x181 created by main.main /home/michal/P/go/src/github.com/ziutek/mymysql/examples/parallel.go:68 +0x1c9

Additionally my MySQL host says in dmesg:

[89269.747493] TCP: Possible SYN flooding on port 3306. Sending cookies. Check SNMP counters. [89392.708404] TCP: Possible SYN flooding on port 3306. Sending cookies. Check SNMP counters.

After disabling SYN cookies on my MySQL server:

sysctl -w net.ipv4.tcp_syncookies=0

your test works without any problems.

$ go run parallel.go 2014/06/05 12:01:41 done 1960 sends, got 1860 results 2014/06/05 12:01:42 got all 3000 results

Please try fixed version: https://github.com/ziutek/mymysql/blob/master/examples/parallel.go and see how it works with enabled and disabled SYN cookies on your server.

ziutek avatar Jun 05 '14 10:06 ziutek

thanks, indeed the issue was making too many connections during a very short time, syncookies seem to be the immediate issue. but the underlying one is that syncookies are only sent when listen queue is overflowed, and default mysql server config has listen queue size = 50.

So the fix seems to be: either disable syncookies or increase listen backlog to larger value and make sure to not overflow that.

thanks for the help.

P.S. would love to have real connection timeouts (i.e. timeout not just on Dial, but on the whole Connect/Reconnect function), and query timeouts. What are your thoughts on that? (shall i create a new issue to discuss this idea?).

anton-povarov avatar Jun 05 '14 13:06 anton-povarov

In standard library we have only SetDeadline method. Probably there will be enough to modify native.pktReader and native.pktWriter to use SetDeadline before any read/write.

ziutek avatar Jun 05 '14 14:06 ziutek

i fix the problem by using function named SetDialer. the connection returned by default dail function doesn't set deadline to read and write operation.

vanillar7 avatar Jun 06 '14 03:06 vanillar7

So 2 cases here

  1. query timeouts - i've managed to get them to timeout fine with conn.NetConn().SetDeadline() - just need some way to actually force close the connection this query is executing in. Currently can't do it, since Close() checks that full result has been read - maybe just export a wrapper for closeConnection() ?
  2. connection timeouts - as @vanillar7 says, this can be achieved with custom Dialer that sets deadline on connection after it's been established.

anton-povarov avatar Jun 06 '14 09:06 anton-povarov