colly icon indicating copy to clipboard operation
colly copied to clipboard

Elegant Scraper and Crawler Framework for Golang

Results 155 colly issues
Sort by recently updated
recently updated
newest added

Hello there! I've come across a situation where I have to save a file with a "double" extension (`*.kepub.epub`), and the current implementation of (r *[Response](https://pkg.go.dev/github.com/gocolly/colly/v2#Response)) FileName() purposefully breaks that...

when I scrapping data, page return http status 404 but result still have html response. I want get response. But in colly, if OnError occurred then onHTML do not occurre....

Any way to handle the case when a selector could not be located? Would like to use a collector instance with a event or something? Did I miss something?

This PR addresses a critical issue encountered when scraping large websites with over 1 million pages. Previously, goroutines were being spawned without any limit, leading to significant memory bloat. This...

when retry scrape requestData will loss in http.NewRequest so Seek requestData before scrape.NewRequest ``` req.ContentLength always 0 and if req.GetBody != nil && req.ContentLength == 0 { req.Body = NoBody...

I do not know where to ask this question, so I will form it here. When will the next release be rolled out? Lots of changes have been done since...

See #745 for more information. Closes #745

### Description This pull request adds support for Depth in Queue and adds a panic when attempting to use Async with Queue, as they are incompatible. The changes ensure that...

The handleOnXML function attempts to parse responses with the content-type `application/vnd.openxmlformats-officedocument.spreadsheetml.sheet`. This is because the function looks for any mention of [xml in the content type](https://github.com/gocolly/colly/blob/9401ae4acc5d2155e0ee09fa71eef4d09d2e412a/colly.go#L1186). This results in a...

bug

Connected to issue #777 "HTML encoding is not autodetected properly". I removed the current gocolly encoding detection, which through tests showed to be unreliable when detecting Cyrillic encodings, and in...