colly
colly copied to clipboard
Option not to pass Request Context to the Next Request
I'm using Request Context to store information about the parsed body on various c.OnHTML
callbacks..
So what happens is, if I use the e.Request.Visit()
for following up on hrefs, then the request context is also being passed. I wanted to avoid this. So instead of using e.Request.Visit()
I used c.Visit()
directly. This made sure that I got new context for each request.
However, I would like to use the MaxDepth
option as well. But that only works if I use the e.Request.Visit()
.
It would work for me to use the e.Request.Visit() but give new context for each request. This is currently not possible. Is that correct?
If yes, this feature request would be great to have as a configuration option - to determine if the request context has to be passed along or not..
For now, I have manually made the change for local purposes..
index 6beef834..524bb77c 100644
--- a/vendor/github.com/gocolly/colly/v2/request.go
+++ b/vendor/github.com/gocolly/colly/v2/request.go
@@ -117,7 +117,7 @@ func (r *Request) AbsoluteURL(u string) string {
// request and preserves the Context of the previous request.
// Visit also calls the previously provided callbacks
func (r *Request) Visit(URL string) error {
- return r.collector.scrape(r.AbsoluteURL(URL), "GET", r.Depth+1, nil, r.Ctx, nil, true)
+ return r.collector.scrape(r.AbsoluteURL(URL), "GET", r.Depth+1, nil, nil, nil, true)
}
// HasVisited checks if the provided URL has been visited```