Avoid modifying WSS URLs since parameters are dropped through the detectURL method
Bug description
Ever since the chromedp v0.7.3, the detectURL method was implemented in util.go to make it easier to send the base URL of a WS to NewRemoteAllocator. The rest of the URL is built by fetching the webSocketDebuggerUrl from json/version and constructing the URL again; however if the websocket is passed on with parameters, such as the case of an authentication token to a WSS for a hosted browser service, then it is impacting connection, since the parameter is dropped and the connection isn't successful.
Steps to reproduce the problem:
- Use chromedp v0.7.3 or above
- Pass a WSS to the NewRemoteAllocator
This throws the following error
2021/12/03 14:42:06 Failed getting title of example.com: could not dial "ws://chrome.browserless.io": unexpected HTTP response status: 301
$ go list -m github.com/chromedp/chromedp
github.com/chromedp/chromedp v0.7.3
$ google-chrome --version
Google Chrome 96.0.4664.55
$ go version
go version go1.17.3 darwin/amd64
What was run to get this error message?
Run the Go script below
package main
import (
"context"
"flag"
"log"
"github.com/chromedp/chromedp"
)
func main() {
var devToolWsUrl string
var title string
flag.StringVar(&devToolWsUrl, "devtools-ws-url", "wss://chrome.browserless.io?token=my_token_here", "DevTools Websocket URL")
flag.Parse()
actxt, cancelActxt := chromedp.NewRemoteAllocator(context.Background(), devToolWsUrl)
defer cancelActxt()
ctxt, cancelCtxt := chromedp.NewContext(actxt) // create new tab
defer cancelCtxt() // close tab afterwards
if err := chromedp.Run(ctxt,
chromedp.Navigate("https://example.com"),
chromedp.Title(&title),
); err != nil {
log.Fatalf("Failed getting title of example.com: %v", err)
}
log.Println("Got title of:", title)
}
What did you expect to see?
I was expecting the WebSocket to connect properly and scrape the Title, which occurs when using v0.7.2.
$ go get -u github.com/chromedp/[email protected]
$ go run chrome.go
2021/12/03 14:42:35 Got title of: Example Domain
What did you see instead?
I saw the following error message, with an unexpected HTTP response due to the fact that the token was not provided when the URL was reformatted through the detectURL method in util.go
$ go get -u github.com/chromedp/[email protected]
$ go run chrome.go
2021/12/03 14:42:06 Failed getting title of example.com: could not dial "ws://chrome.browserless.io": unexpected HTTP response status: 301
exit status 1
Possible issue
This most likely has to do with PR #817
While trying to simplify the way the URL is passed on to NewRemoteAllocator, it is unfortunately removing the arguments passed after the hostname at the moment it reconstructs the URL with the /json/version payload. Important arguments are lost in the process such as the token that is used to authenticate to many hosted browser services.
Possible fixes and justification
If the detectURL method returns Secure WebSockets as is, then they won't be modified by this function. Why would this be important for WSS? Because most hosted browser services are running on WSS and are mainly the ones that require to authenticate through a token, which sometimes are passed through an argument such as this case and would make it important for it not be dropped from the URL.
Possible fixes
- Ignore urls containing "/devtools/browser/" and also the ones containing "wss" since these endpoints could use params as authentication methods to hosted services. Trusting that WSS will most likely be hosted services and are probably going to point exactly to the correct path anyways, and would not require the assistance that the PR #817 provides. Something like this could work for the util.go:
func detectURL(urlstr string) string {
if strings.Contains(urlstr, "/devtools/browser/") || strings.HasPrefix(urlstr, "wss") {
return urlstr
}
// replace the scheme and path to construct the URL like:
// http://127.0.0.1:9222/json/version
u, err := url.Parse(urlstr)
- Modify the detectURL method to extract the arguments provided after the hostname, so that they can be appended to the webSocketDebuggerUrl that is retrieved from
json/version.
Hey folks -- wanted to bump this up as we're more than happy to implement the work, but just wanted to get quick consensus that the approach is right. Please do let us know your thoughts and PR should follow up!
Hi @joelgriffith, thank you first!
I think it's better to make it explicit that we want to use the orignal URL. So my suggestion is to make NewRemoteAllocator accept options (just like NewExecAllocator), and add a pre-defined option NoDetectURL (maybe there is a better name). chromedp should not try to modify the URL when this option is present (including not calling forceIP which could modify the URL too).
@joelgriffith and @AlexLoyola what do you think?
Hi @ZekeLu, thank you for your feedback! @joelgriffith and I think it's a great approach - we'll start working on a PR based on that. We might take a few weeks though due to the holiday season, so expect some delay on this task.
Thanks again for weighing in!
any solution here? can't use browserless with token :)
any solution here? can't use browserless with token :)
Hey @kotchuprik , until we finish up this change, the workaround for now is to "pin" chromedp to an older version... Anything prior to "v0.7.3" should work. Please let me know if this workaround works for you in the meantime.
We've been working on a new product with browserless so we haven't got to working on this PR lately. We'll try to put in the time soon. Thanks for understanding.