go_spider icon indicating copy to clipboard operation
go_spider copied to clipboard

[爬虫框架 (golang)] An awesome Go concurrent Crawler(spider) framework. The crawler is flexible and modular. It can be expanded to an Individualized crawler easily or you can use the default crawl com...

Results 23 go_spider issues
Sort by recently updated
recently updated
newest added

https://github.com/Jack-Cherish/python-spider https://github.com/GoJerry/JSCrack 特别是这个项目

我现在想获取html里面的script内容要怎么获取,并且获取完后我想过滤

The items of the PageItems are map[stirng]string, but many times we want to save some struct. It is not very convenient.

自定义的程序处理爬虫过程processor; 自定义的结构体中如有某个属性存在资源竞争,需要定义sync.Mutex,加锁吗?

我将并发数调到20 ,但是发现同一个请求被重复爬取,且大多数请求未被处理 ![image](https://user-images.githubusercontent.com/19374680/44301505-af5fd200-a34a-11e8-9fac-247e6f8a6268.png) ![image](https://user-images.githubusercontent.com/19374680/44301501-93f4c700-a34a-11e8-91ad-38b7367d33c2.png)

taskname有什么用处

自动转码有的时候会出问题,参见: #30

比如这个url `https://www.repian110.com/film6/48765/48765.js` 应该是gbk编码的文件,里面有一个圆点,httpdownloader自动转码后变成乱码,而iconv转gbk到utf8是正常的。