gohive
gohive copied to clipboard
HTTP(S) plain/ldap authentication connection does not support cookies and issue with the cookie handling
Many thanks for this fantastic package! Below is an explanation of an issue we are having and the solution we used to fix it. Any thoughts if this would be ok to submit it as a pull request? Happy to provide more information.
In our environment we access Hive with HTTPS and plain authentication (username/password). With this configuration (auth==”NONE”) gohive does not currently add a cookie jar to the httpClient. It only adds the cookie jar when auth=”KERBEROS”. Adding the cookie jar also when auth==”NONE” is simple but then another issue surfaces.
Looks like this issue was reported in https://github.com/beltran/gohive/issues/183: “Async sends less queries to Hive, this code sends requests to Hive that I imagine are failing because the headers are too big (HTTP 431 error). I'm guessing returning jar.storage[u.Host] is making the header too big.”
Looking at the request headers this is indeed the issue. If the httpclient uses a cookiejar then the Cookie request header keeps increasing on each request as the cookies are appended to the cookie header on each request. Eventually the Cookie header grows too large and the request is rejected by the server with the 431 HTTP return code. Using the inMemoryCookieJar does not fix this issue.
The underlying problem seems to be somewhere in the Apache Thrift Go package. It seems the thrift client reuses the Cookie header from the previous request and then the Go httpclient adds the Cookies again from the cookiejar (as it should). Deleting the Cookie header with the thrift client before executing a new query seems to mitigate the problem but unfortunately it does not fully fix the underlying issue.
Following patch adds support for the cookie jar when Auth is NONE. Note that the standard Go cookiejar implementation is used. There is no need to use a custom implementation as the standard cookiejar is working as it should.
hive.go line 250
...
if auth == "NONE" {
httpClient, protocol, err := getHTTPClient(configuration)
if err != nil {
return nil, err
}
httpClient.Jar, err = cookiejar.New(&cookiejar.Options{PublicSuffixList: publicsuffix.List})
if err != nil {
return nil, err
}
httpOptions := thrift.THttpClientOptions{Client: httpClient}
transport, err = thrift.NewTHttpClientTransportFactoryWithOptions(fmt.Sprintf(protocol+"://%s:%s@%s:%d/"+configuration.HTTPPath, url.QueryEscape(configuration.Username), url.QueryEscape(configuration.Password), host, port), httpOptions).GetTransport(socket)
if err != nil {
return nil, err
}
...
Following patch mitigates the problem with the Cookie header by deleting the header from the Thirft client before running a new query. Httpclient will add the Cookie header from the Cookiejar so the copy in the Thirft client is redundant.
hive.go line 1165
func (c *Cursor) resetState() error {
// Remove the cookie header otherwise it gets appended on every call
t, ok := c.conn.transport.(*thrift.THttpClient)
if ok {
t.DelHeader("Cookie")
}
c.response = nil
c.Err = nil
...
Following patch closes the cursor if a default database was defined. This will also mitigate the issue as the Cookie header is cleaned when the cursor is closed.
hive.go line 360
if configuration.Database != "" {
cursor := connection.Cursor()
defer cursor.Close()
cursor.Exec(context.Background(), "USE "+configuration.Database)
if cursor.Err != nil {
return nil, cursor.Err
}
}
Hello, thank you for the extensive and detailed bug report.
If the httpclient uses a cookiejar then the Cookie request header keeps increasing on each request as the cookies are appended to the cookie header on each request.
I've been suspecting this could be consistently occurring for a while. This seems very similar to https://github.com/beltran/gohive/issues/183, which unfortunately I never got around fixing.
Following patch mitigates the problem with the Cookie header by deleting the header from the Thirft client before running a new query. Httpclient will add the Cookie header from the Cookiejar so the copy in the Thirft client is redundant.
Admittedly, introducing a dependency on what appears to be a bug makes me a bit nervous, but this bug - if indeed a bug - must have been around for years, and AFAIK there's no JIRA ticket for it.
Any thoughts if this would be ok to submit it as a pull request?
This would be great, the code you added looks good to me.