sskmtm issues

Results 5 issues of


                                            sskmtm

can not get data after timeout multi time

发现几种数据爬取失败时的日志，前2次失败了都会在几分钟后重试，第三次失败后，后面就不会重试了，也不会进入：isRelevant (true) -> onBeforeFilter -> onBeforeExtract -> extract -> onAfterExtract -> onAfterFilter 这个流程请问： 1、失败三次就直接失败是框架的机制吗？还是说可以通过某些设置解决 2、有没有办法设置，或者代码操作的时候，让失败了还是可以进入：isRelevant (true) -> onBeforeFilter -> onBeforeExtract -> extract -> onAfterExtract -> onAfterFilter 这个流程，因为这样可以做一些后置（清除操作）处理第一次失败： `Timeout...

good first issue

wontfix

Unable to get data when using a proxy

在不使用代理的情况下，`main` 分支代码可以正常运行在使用代理的情况下，总是不能正确的获取页面（持续很长时间都没有正确的爬取页面）爬取的日志总是（ 💯 🔃 S for RR got 200 2.64 KiB

How to match the extraction configuration when the web page is redirected

如果一个网页在获取后发生了重定向，有什么办法可以配置：extract-config.json 中的patttern 匹配重定向后的 url 呢？

good first issue

chrome devtools driver find 'typeError'

以下是错误日志： ```plain /usr/bin/google-chrome-stable --proxy-server=1.84.252.243:4231 --headless --disable-gpu --hide-scrollbars --remote-debugging-port=0 --no-default-browser-check --no-first-run --no-startup-window --mute-audio --disable-background-networking --disable-background-timer-throttling --disable-client-side-phishing-detection --disable-hang-monitor --disable-popup-blocking --disable-prompt-on-repost --disable-sync --disable-translate --disable-blink-features=AutomationControlled --metrics-recording-only --safebrowsing-disable-auto-update --no-sandbox --ignore-certificate-errors --window-size=1920,1080 --pageLoadStrategy=none --throwExceptionOnScriptError=true --user-data-dir=/tmp/pulsar-root/context/browser/br.66b305 21:17:11.427...

good first issue

wontfix

cycle crawl product reviews fail

循环爬取 /prudct-reviews/... 页面的内容，第一页爬取是正常的，当爬取到第二页的时候出了问题，爬取到的文件内容如下：请问，这种情况应该怎么解决（未使用代理）？

good first issue

wontfix