colly icon indicating copy to clipboard operation
colly copied to clipboard

How to get response headers on redirect?

Open icamys opened this issue 5 years ago • 11 comments

Hi! First of all I would like to thank you for this wonderful project!

Here's the question: is there any way to get the response headers coming along with redirect response from server?

In sources I see that I can control redirects using RedirectHandler. However this callback has only two parameters: req *http.Request, via []*http.Request. From here I can not get headers received from server that are coming with 301 code. Are there any hidden options to get those headers?

icamys avatar Mar 01 '19 19:03 icamys

The http.Client doesn't support this either, unfortunately. The best way I can think to do this is to add a custom collector.RedirectHandler, that returns http.ErrUseLastResponse. Also have an OnResponse callback to pull the header you want, and manually Request.Visit() the next Location.

Let me know if this works!

vosmith avatar Mar 18 '19 13:03 vosmith

This would be very helpful for me as well. I am looking to gather the entire redirect chain if possible. Currently I can get final destination from Response.Request.URL, but I would love to be able to record the series of URLs that redirected.

derricw avatar Mar 20 '19 00:03 derricw

@vosmith Unfortunately it's not the solution because redirect chain would not be kept in the request context and there is not way to find out the redirect chain for a particular initial url. The goal is to get the redirect chain for an url in one step without doing additional requests.

icamys avatar Mar 20 '19 11:03 icamys

@derricw @icamys For the sake of clarity, let me see I understand both of your use cases correctly. You would like to capture the entire redirect chain, along with the response headers for each response in the chain?

vosmith avatar Mar 20 '19 14:03 vosmith

@vosmith Absolutely correct. At the moment as a workaround I collect all redirect chain urls in a separate sync.map and in OnResponse I send requests to those urls to recreate the whole chain and get the headers. It seems that it is not the best solution considering that fact that colly has made those requests before.

icamys avatar Mar 20 '19 15:03 icamys

It looks like it isn't directly supported by the golang http.Client, but I'll spend some time digging around to see if there is something that can be leveraged to make this work.

vosmith avatar Mar 20 '19 17:03 vosmith

@vosmith in my case I don't necessarily care about the response headers, I just want to know which URL's I'm redirected through. Currently, I can know the URL I requested, and the URL that I ended at, but nothing about what happened in between.

derricw avatar Mar 20 '19 21:03 derricw

@vosmith Hey everyone, I'm looking into implementing this myself. I notice that http.Client has a redirect callback that gives the complete history as a slice is its second arg via. Colly already uses this functionality here: https://github.com/gocolly/colly/blob/master/colly.go#L377.

I can see two avenues to pursue:

  1. We make OnRedirect a first-class callback like OnResponse and OnRequest.
  2. We add a field in colly.Response that stores the most up-to-date redirect slice and let users pull it out and use it if they want.

I will probably attempt option 2, since it seems like it would be the simplest.

derricw avatar Apr 08 '19 21:04 derricw

Ok after further inspection, here is the challenge (and probably the reason this isn't already implemented):

The http.Client is shared for all Requests, so a callback would have no reference to a colly.Request to place the data in. There will have to be some sort of way to lookup the correct Request. If you guys have any idea how you'd like that done before I just hamfist it, let me know.

derricw avatar Apr 08 '19 22:04 derricw

Try OnError function c.OnError(func(response *colly.Response, err error) { if err != nil { log.Println("ERROR ", err.Error()) } })

tienn2t avatar Jul 31 '20 10:07 tienn2t

Hi! First of all I would like to thank you for this wonderful project!

Here's the question: is there any way to get the response headers coming along with redirect response from server?

In sources I see that I can control redirects using RedirectHandler. However this callback has only two parameters: req *http.Request, via []*http.Request. From here I can not get headers received from server that are coming with 301 code. Are there any hidden options to get those headers?

Here's my solution :

package main

import (
	"fmt"
	"net/http"

	"github.com/gocolly/colly/v2"
)

func main() {
	//1.Login Action Start
	c := colly.NewCollector()

	c.SetRedirectHandler(func(r *http.Request, via []*http.Request) error {
		fmt.Println("Uncoming Redirecting to", r.URL) // 打印准备重定向的链接
		return http.ErrUseLastResponse                // 禁止跳转
		// return nil                           // 不禁止跳转
	})
	c.OnResponseHeaders(func(r *colly.Response) {
		fmt.Println("Response Code:\t", r.StatusCode)
		fmt.Println("Response Headers:")
		for k, v := range *r.Headers {
			fmt.Println("\t", k, ":", v)
		}
	})
	c.Visit("https://httpbin.org/redirect/6")
}
// Response Code:   302
// Response Headers:
//          Access-Control-Allow-Credentials : [true]
//          Date : [Fri, 08 Dec 2023 04:33:50 GMT]
//          Content-Type : [text/html; charset=utf-8]
//          Content-Length : [247]
//          Server : [gunicorn/19.9.0]
//          Location : [/relative-redirect/5]
//          Access-Control-Allow-Origin : [*]

coloraven avatar Dec 08 '23 04:12 coloraven