colly icon indicating copy to clipboard operation
colly copied to clipboard

Option not to pass Request Context to the Next Request

Open sundarv85 opened this issue 2 years ago • 0 comments

I'm using Request Context to store information about the parsed body on various c.OnHTML callbacks..

So what happens is, if I use the e.Request.Visit() for following up on hrefs, then the request context is also being passed. I wanted to avoid this. So instead of using e.Request.Visit() I used c.Visit() directly. This made sure that I got new context for each request.

However, I would like to use the MaxDepth option as well. But that only works if I use the e.Request.Visit().

It would work for me to use the e.Request.Visit() but give new context for each request. This is currently not possible. Is that correct?

If yes, this feature request would be great to have as a configuration option - to determine if the request context has to be passed along or not..

For now, I have manually made the change for local purposes..

index 6beef834..524bb77c 100644
--- a/vendor/github.com/gocolly/colly/v2/request.go
+++ b/vendor/github.com/gocolly/colly/v2/request.go
@@ -117,7 +117,7 @@ func (r *Request) AbsoluteURL(u string) string {
 // request and preserves the Context of the previous request.
 // Visit also calls the previously provided callbacks
 func (r *Request) Visit(URL string) error {
-	return r.collector.scrape(r.AbsoluteURL(URL), "GET", r.Depth+1, nil, r.Ctx, nil, true)
+	return r.collector.scrape(r.AbsoluteURL(URL), "GET", r.Depth+1, nil, nil, nil, true)
 }
 
 // HasVisited checks if the provided URL has been visited```

sundarv85 avatar Nov 30 '22 10:11 sundarv85