exotic-amazon
exotic-amazon copied to clipboard
Unable to get data when using a proxy
在不使用代理的情况下,main
分支代码可以正常运行
在使用代理的情况下,总是不能正确的获取页面(持续很长时间都没有正确的爬取页面)
爬取的日志总是( 💯 🔃 S for RR got 200 2.64 KiB <- 2.64 KiB):
19:23:44.914 [r-worker-1] INFO a.p.p.c.component.LoadComponent.Task - 99. 💯 🔃 S for RR got 200 2.64 KiB <- 2.64 KiB [💿4.40 KiB] in 8.891s, last fetched 9s ago, fc:31 | 2/3/0/0/756 | nf:3/3/3 | 115.234.228.146 | 1IIdXw62 | file:///var/folders/vr/_8xgwfn14959gb617jpn7gv40000gp/T/ln/1f6ede83881b702b4a3c5ffa9b01ef51.htm | https://www.amazon.com/s?k=sport+shoes -parse -refresh
或者(💔 🔃 S for RR got 1601 2.64 KiB [💿4.40 KiB]):
20:16:37.104 [r-worker-4] INFO a.p.p.c.component.LoadComponent.Task - 39. 💔 🔃 S for RR got 1601 2.64 KiB [💿4.40 KiB] in 1m8.373s, last fetched 1m9s ago, fc:1/42 Retry(1601) rs: Timeout to wait for document ready, rsp: CRAWL | 2/3/0/0/756 | nf:3/3/3 | 183.151.120.172 | 18qo0l66 | file:///var/folders/vr/_8xgwfn14959gb617jpn7gv40000gp/T/ln/1f6ede83881b702b4a3c5ffa9b01ef51.htm | https://www.amazon.com/s?k=sport+shoes -parse -refresh
其中,
爬取的链接:https://www.amazon.com/s?k=sport+shoes
参数:-parse -refresh
爬取的页面:
在本地测试过,相同的链接,都在使用代理的情况下: 老版本可以爬取下来 新版本就会出现上面的情况