No Timeout Logic/Deadlock.
Hi, I've recently added the spdy library to reduce/speed up the number of open connections in our client when communicating with Google APIs.
Something I've noticed is the library (hash: 665a4b7cf850a772c4ee2078f11b63667ec32714) doesn't really handle timeout logic well. 1.) You can't use Client.Timeout because Client Transport of type *spdy.Transport doesn't support CancelRequest; Timeout not supported. 2.) When a hang does occur for whatever reason, the whole program just deadlocks forever.
Here is debug output (This is after several successful requests) (GET request data snipped as it contains sensitive information):
[2015-01-08T13:54:55.547-08:00] [download.go:307] [INFO] Making request...
(spdy debug) 2015/01/08 13:54:55 Requesting "https://www.googleapis.com:443" over SPDY.
(spdy debug) 2015/01/08 13:54:55 Sending SYN_STREAM:
(spdy debug) 2015/01/08 13:54:55 SYN_STREAM {
Version: 3
Flags: common.FLAG_FIN
Stream ID: 39
Associated Stream ID: 0
Priority: 2
Slot: 0
Header: http.Header{":scheme":[]string{"https"}, ":method":[]string{"GET"}, ":path":[]string{""}, ":version":[]string{"HTTP/1.1"}, ":host":[]string{"www.googleapis.com"}}
}
(spdy debug) 2015/01/08 13:54:57 Receiving SYN_REPLY:
(spdy debug) 2015/01/08 13:54:57 SYN_REPLY {
Version: 3
Flags: [NONE]
Stream ID: 39
Header: http.Header{"X-Xss-Protection":[]string{"1; mode=block"}, "Date":[]string{"Thu, 08 Jan 2015 21:54:56 GMT"}, "Content-Length":[]string{"5820"}, "Etag":[]string{"\"F9iA7pnxqNgrkOutjQAa9F2k8HY/uDgBEKga491tbdtqT08CJEKkW-0\""}, "X-Content-Type-Options":[]string{"nosniff"}, ":status":[]string{"200 OK"}, "Server":[]string{"GSE"}, "Vary":[]string{"Origin", "X-Origin"}, "X-Frame-Options":[]string{"SAMEORIGIN"}, "Expires":[]string{"Thu, 08 Jan 2015 21:54:56 GMT"}, "Alternate-Protocol":[]string{"443:quic,p=0.02"}, "Cache-Control":[]string{"private, max-age=300, must-revalidate, no-transform"}, "Content-Type":[]string{"application/json; charset=UTF-8"}, ":version":[]string{"HTTP/1.1"}}
}
(spdy debug) 2015/01/08 13:54:57 Receiving DATA:
(spdy debug) 2015/01/08 13:54:57 DATA {
Stream ID: 39
Flags: [NONE]
Length: 1270
Data: [7b 0a 20 22 6b 69 6e 64 22 ... 20 7b 0a 20 20 20 22 6b 69]
}
(spdy debug) 2015/01/08 14:00:27 Receiving RST_STREAM:
(spdy debug) 2015/01/08 14:00:27 RST_STREAM {
Version: 3
Stream ID: 39
Status code: INTERNAL_ERROR
}
(spdy) 2015/01/08 14:00:27 processing.go:45: Warning: Received INTERNAL_ERROR on stream 39. Closing connection.
(spdy debug) 2015/01/08 14:00:27 Sending GOAWAY:
(spdy debug) 2015/01/08 14:00:27 GOAWAY {
Version: 3
Last good stream ID: 0
Status code: (0)
}
However, even after the GOAWAY is sent, the actually library is deadlocked preventing any chance for a clean retry. Here is the stacktrace below
goroutine 326 [chan receive, 13 minutes]:
github.com/SlyMarbo/spdy/spdy3.(*RequestStream).Run(0xc2081eb950, 0x0, 0x0)
/Users/nimi/go/src/github.com/SlyMarbo/spdy/spdy3/request_stream.go:233 +0x63
github.com/SlyMarbo/spdy/spdy3.(*Conn).RequestResponse(0xc2080af800, 0xc2091969c0, 0x0, 0x0, 0x2, 0x1, 0x0, 0x0)
/Users/nimi/go/src/github.com/SlyMarbo/spdy/spdy3/requests.go:145 +0x12f
github.com/SlyMarbo/spdy.(*Transport).RoundTrip(0xc208048c80, 0xc2091969c0, 0x37e, 0x0, 0x0)
/Users/nimi/go/src/github.com/SlyMarbo/spdy/transport.go:310 +0xef3
net/http.send(0xc2091969c0, 0xb1f9b0, 0xc208048c80, 0xc209362000, 0x0, 0x0)
/usr/local/Cellar/go/1.3/libexec/src/pkg/net/http/client.go:195 +0x43d
net/http.(*Client).send(0xc2080b51a0, 0xc2091969c0, 0x32b, 0x0, 0x0)
/usr/local/Cellar/go/1.3/libexec/src/pkg/net/http/client.go:118 +0x15b
net/http.(*Client).doFollowingRedirects(0xc2080b51a0, 0xc2091969c0, 0x7f8e40, 0x0, 0x0, 0x0)
/usr/local/Cellar/go/1.3/libexec/src/pkg/net/http/client.go:343 +0x97f
net/http.(*Client).Do(0xc2080b51a0, 0xc2091969c0, 0xc20934fc00, 0x0, 0x0)
/usr/local/Cellar/go/1.3/libexec/src/pkg/net/http/client.go:150 +0xa3
From what I can understand is the stream dies, but is never cleaned up.
This is curious. Immediately after the INTERNAL_ERROR warning (spdy3/processing.go:45), the connection is closed. One part of the clean-up code closes all active streams so the stream's Close method should be being called and closing the finished channel, which is what's being waited on in the stacktrace above. Clearly that's not happening for some reason.
My theory is that the order of action in the connection shutdown might be at fault. Try it with the change I just pushed and see whether that helps.
That seems to have worked - but I'm not sure if its correct. The stream times out (I'm guessing the server closes the connection), but Body is a partial response, and no error is returned (err is nil). I'm unsure if this is a problem with the library or the server (or anywhere inbetween), but it seems to me this should return an error at least (timed out, or incomplete).
Here is the new response:
[2015-01-08T14:40:45.407-08:00] [GEARD] [download.go:307] [INFO] Making request...
(spdy debug) 2015/01/08 14:40:45 Requesting "https://www.googleapis.com:443/" over SPDY.
(spdy debug) 2015/01/08 14:40:45 Sending SYN_STREAM:
(spdy debug) 2015/01/08 14:40:45 SYN_STREAM {
Version: 3
Flags: common.FLAG_FIN
Stream ID: 39
Associated Stream ID: 0
Priority: 2
Slot: 0
Header: http.Header{":method":[]string{"GET"}, ":path":[]string{""}, ":version":[]string{"HTTP/1.1"}, ":host":[]string{"www.googleapis.com"}, ":scheme":[]string{"https"}}
}
(spdy debug) 2015/01/08 14:40:46 Receiving SYN_REPLY:
(spdy debug) 2015/01/08 14:40:46 SYN_REPLY {
Version: 3
Flags: [NONE]
Stream ID: 39
Header: http.Header{"Cache-Control":[]string{"private, max-age=300, must-revalidate, no-transform"}, "Server":[]string{"GSE"}, "X-Xss-Protection":[]string{"1; mode=block"}, "Content-Type":[]string{"application/json; charset=UTF-8"}, "Expires":[]string{"Thu, 08 Jan 2015 22:40:45 GMT"}, "X-Content-Type-Options":[]string{"nosniff"}, "X-Frame-Options":[]string{"SAMEORIGIN"}, ":status":[]string{"200 OK"}, ":version":[]string{"HTTP/1.1"}, "Alternate-Protocol":[]string{"443:quic,p=0.02"}, "Vary":[]string{"Origin", "X-Origin"}, "Content-Length":[]string{"5820"}, "Date":[]string{"Thu, 08 Jan 2015 22:40:45 GMT"}, "Etag":[]string{"\"F9iA7pnxqNgrkOutjQAa9F2k8HY/uDgBEKga491tbdtqT08CJEKkW-0\""}}
}
(spdy debug) 2015/01/08 14:40:46 Receiving DATA:
(spdy debug) 2015/01/08 14:40:46 DATA {
Stream ID: 39
Flags: [NONE]
Length: 1270
Data: [7b 0a 20 22 6b 69 6e 64 22 ... 20 7b 0a 20 20 20 22 6b 69]
}
(spdy debug) 2015/01/08 14:46:16 Receiving RST_STREAM:
(spdy debug) 2015/01/08 14:46:16 RST_STREAM {
Version: 3
Stream ID: 39
Status code: INTERNAL_ERROR
}
(spdy) 2015/01/08 14:46:16 processing.go:45: Warning: Received INTERNAL_ERROR on stream 39. Closing connection.
(spdy debug) 2015/01/08 14:46:16 Sending GOAWAY:
(spdy debug) 2015/01/08 14:46:16 GOAWAY {
Version: 3
Last good stream ID: 0
Status code: (0)
}
(spdy debug) 2015/01/08 14:46:16 Sending RST_STREAM:
(spdy debug) 2015/01/08 14:46:16 RST_STREAM {
Version: 3
Stream ID: 39
Status code: CANCEL
}
[2015-01-08T14:46:16.261-08:00] [GEARD] [download.go:317] [ERROR] Failed to unmarshal: unexpected end of JSON input
[2015-01-08T14:46:16.262-08:00] [GEARD] [download.go:307] [INFO] Making request...
(next request succeeds)
From (spdy debug) 2015/01/08 14:40:46 Receiving DATA: to (spdy debug) 2015/01/08 14:46:16 Receiving RST_STREAM: theres a 5 minute gap, and in the DATA frame only 1270 bytes were received, while the server reported in the headers that the content size is 5820.
Does SPDY not have the capability to detect incomplete streams? I'm reading over the spec & implementation now.
You're right, I don't think I'm passing on the error correctly. I'll look into that. From what I remember it's an implementation factor rather than a factor of the protocol.
I'll add error passing tomorrow since it'll take more time than I have this evening.
Latest commit adds some error passing. Timeout logic is a todo.