exotic-amazon icon indicating copy to clipboard operation
exotic-amazon copied to clipboard

Unable to get data when using a proxy

Open sskmtm opened this issue 1 year ago • 3 comments

在不使用代理的情况下,main 分支代码可以正常运行

在使用代理的情况下,总是不能正确的获取页面(持续很长时间都没有正确的爬取页面)

爬取的日志总是( 💯 🔃 S for RR got 200 2.64 KiB <- 2.64 KiB):

19:23:44.914 [r-worker-1] INFO  a.p.p.c.component.LoadComponent.Task -  99. 💯 🔃 S for RR got 200 2.64 KiB <- 2.64 KiB [💿4.40 KiB] in 8.891s, last fetched 9s ago, fc:31 | 2/3/0/0/756 | nf:3/3/3      | 115.234.228.146 | 1IIdXw62 | file:///var/folders/vr/_8xgwfn14959gb617jpn7gv40000gp/T/ln/1f6ede83881b702b4a3c5ffa9b01ef51.htm | https://www.amazon.com/s?k=sport+shoes -parse -refresh

或者(💔 🔃 S for RR got 1601 2.64 KiB [💿4.40 KiB]):

20:16:37.104 [r-worker-4] INFO  a.p.p.c.component.LoadComponent.Task -  39. 💔 🔃 S for RR got 1601 2.64 KiB [💿4.40 KiB] in 1m8.373s, last fetched 1m9s ago, fc:1/42 Retry(1601) rs: Timeout to wait for document ready, rsp: CRAWL | 2/3/0/0/756 | nf:3/3/3      | 183.151.120.172 | 18qo0l66 | file:///var/folders/vr/_8xgwfn14959gb617jpn7gv40000gp/T/ln/1f6ede83881b702b4a3c5ffa9b01ef51.htm | https://www.amazon.com/s?k=sport+shoes -parse -refresh

其中, 爬取的链接:https://www.amazon.com/s?k=sport+shoes 参数:-parse -refresh

爬取的页面: image

在本地测试过,相同的链接,都在使用代理的情况下: 老版本可以爬取下来 新版本就会出现上面的情况

sskmtm avatar Mar 30 '23 12:03 sskmtm