exotic-amazon
exotic-amazon copied to clipboard
can not get data after timeout multi time
发现几种数据爬取失败时的日志,前2次失败了都会在几分钟后重试,第三次失败后,后面就不会重试了,也不会进入:isRelevant (true) -> onBeforeFilter -> onBeforeExtract -> extract -> onAfterExtract -> onAfterFilter 这个流程
请问: 1、失败三次就直接失败是框架的机制吗?还是说可以通过某些设置解决 2、有没有办法设置,或者代码操作的时候,让失败了还是可以进入:isRelevant (true) -> onBeforeFilter -> onBeforeExtract -> extract -> onAfterExtract -> onAfterFilter 这个流程,因为这样可以做一些后置(清除操作)处理
第一次失败:
Timeout to wait for document ready after 60 round, retry is supposed
⚠ Privacy leak warning
U for N got 1601 0 <- 0 in 1m8.826s
Trying 2th 5m later
22:19:32.142 [-worker-14] WARN a.p.p.p.b.emulator.BrowserEmulator - Timeout to wait for document ready after 60 round, retry is supposed | https://www.amazon.com/dp/B00HXGSBXC
22:19:32.294 [-worker-14] INFO a.p.p.p.b.e.c.MultiPrivacyContextManager - ⚠ Privacy leak warning 1/8 | 15#15GsQSA107 | 2787. Retry(1601) rs: Timeout to wait for document ready, rsp: PRIVACY
22:19:32.301 [5-thread-1] INFO a.p.p.p.b.e.c.MultiPrivacyContextManager - Privacy context is inactive, closing it | 32m58s | 103wEu5102 |
22:19:32.303 [5-thread-1] INFO a.p.p.p.b.e.c.BrowserPrivacyContext - Privacy context #103wEu5102 has lived for 32m58s | success: 70(0.04 pages/s) | small: 0(0.0%) | traffic: 0 B(0 B/s) | tasks: 70 total run: 70 | [106.32.14.101:4283 => 106.32.14.101](0/70/0s)[retired idle] (st, 2), (pg, 70)
22:19:32.303 [5-thread-1] INFO a.p.p.p.b.e.context.WebDriverContext - All tasks return in 0 seconds | 103wEu51021
22:19:32.304 [5-thread-1] INFO a.p.p.p.b.d.BrowserAccompaniedDriverPoolCloser - Closing browser & driver pool with HEADLESS mode | {pulsar_chrome, 106.32.14.101:4283 | /var/folders/vr/_8xgwfn14959gb617jpn7gv40000gp/T/pulsar-kust/context/cx.103wEu5102}
22:19:32.326 [-worker-14] INFO a.p.p.c.component.LoadComponent.Task - 2787. 💔 ⚡ U for N got 1601 0 <- 0 in 1m8.826s, fc:1/1 Retry(1601) rs: Timeout to wait for document ready, rsp: CRAWL | 15GsQSA107 | https://www.amazon.com/dp/B00HXGSBXC -parse -refresh
22:19:32.398 [-worker-14] INFO a.p.p.c.impl.StreamingCrawler.Task - 2787. 🤺 Trying 2th 5m later | U for N got 1601 0 <- 0 in 1m8.826s, fc:1/1 Retry(1601) rs: Timeout to wait for document ready, rsp: CRAWL | 15GsQSA107 | https://www.amazon.com/dp/B00HXGSBXC
第二次失败:
Page is ROBOT_CHECK
⚠ Privacy leak warning
U for RT got 1601 0 <- 0 in 10.709s
Trying 3th 7m later
22:24:44.338 [-worker-12] WARN a.p.p.p.b.e.i.BrowserEmulatorImplBase - 2790. Page is ROBOT_CHECK(10.98 KiB) with [122.232.253.12:4245 => 122.232.253.12](0/0/24m18s)[ready] in amazon.com(0) | file:///var/folders/vr/_8xgwfn14959gb617jpn7gv40000gp/T/ln/1d326bbcba3ed428a4a1afd8dcd488fd.htm
22:24:44.446 [-worker-12] INFO a.p.p.p.b.e.c.MultiPrivacyContextManager - ⚠ Privacy leak warning 1/8 | 16#16n2ckM108 | 2790. Retry(1601) rs: ROBOT_CHECK, rsp: PRIVACY
22:24:44.481 [-worker-12] INFO a.p.p.c.component.LoadComponent.Task - 2790. 💔 🔃 U for RT got 1601 0 <- 0 in 10.709s, last fetched 5m12s ago, fc:2/2 Retry(1601) rs: ROBOT_CHECK, rsp: CRAWL | 16n2ckM108 | https://www.amazon.com/dp/B00HXGSBXC -parse
22:24:44.483 [-worker-12] INFO a.p.p.c.impl.StreamingCrawler.Task - 2790. 🤺 Trying 3th 7m later | U for RT got 1601 0 <- 0 in 10.709s, last fetched 5m12s ago, fc:2/2 Retry(1601) rs: ROBOT_CHECK, rsp: CRAWL | 16n2ckM108 | https://www.amazon.com/dp/B00HXGSBXC
第三次失败:
Timeout to wait for document ready after 60 round, retry is supposed
⚠ Privacy leak warning
U for RT got 1601 0 <- 0 in 1m0.988s
Gone (unexpected)
22:32:46.141 [-worker-12] WARN a.p.p.p.b.emulator.BrowserEmulator - Timeout to wait for document ready after 60 round, retry is supposed | https://www.amazon.com/dp/B00HXGSBXC
22:32:46.265 [-worker-12] INFO a.p.p.p.b.e.c.MultiPrivacyContextManager - ⚠ Privacy leak warning 2/8 | 15#15GsQSA107 | 2793. Retry(1601) rs: Timeout to wait for document ready, rsp: PRIVACY
22:32:46.266 [-worker-12] INFO a.p.p.c.component.LoadComponent.Task - 2793. 💔 🔃 U for RT got 1601 0 <- 0 in 1m0.988s, last fetched 8m1s ago, fc:3/3 Retry(1601) rs: Timeout to wait for document ready, rsp: CRAWL | 15GsQSA107 | https://www.amazon.com/dp/B00HXGSBXC -parse
22:32:46.267 [-worker-12] INFO a.p.p.c.impl.StreamingCrawler.Task - 2793. Gone (unexpected) U for RT got 1601 0 <- 0 in 1m0.988s, last fetched 8m1s ago, fc:3/3 Retry(1601) rs: Timeout to wait for document ready, rsp: CRAWL | 15GsQSA107 | https://www.amazon.com/dp/B00HXGSBXC