exotic-amazon issues

unable to read task file

18

java.nio.file.FileSystemNotFoundException: null at jdk.zipfs/jdk.nio.zipfs.ZipFileSystemProvider.getFileSystem(ZipFileSystemProvider.java:169) at jdk.zipfs/jdk.nio.zipfs.ZipFileSystemProvider.getPath(ZipFileSystemProvider.java:155) at java.base/java.nio.file.Path.of(Path.java:208) at java.base/java.nio.file.Paths.get(Paths.java:97) at ai.platon.exotic.amazon.crawl.boot.component.AmazonGenerator.getPeriodicalSeedDirectories(AmazonGenerator.kt:61) at ai.platon.exotic.amazon.crawl.boot.component.AmazonGenerator.generateLoadingTasks(AmazonGenerator.kt:111) at ai.platon.exotic.amazon.crawl.boot.component.AmazonGenerator.generateStartupTasks(AmazonGenerator.kt:85) at ai.platon.exotic.amazon.crawl.boot.component.AmazonCrawler.generate(AmazonCrawler.kt:53) at ai.platon.scent.crawl.AbstractRunnableCrawler.run0(AbstractRunnableCrawler.kt:49) at ai.platon.scent.crawl.AbstractRunnableCrawler.run$suspendImpl(AbstractRunnableCrawler.kt:29) at ai.platon.scent.crawl.AbstractRunnableCrawler.run(AbstractRunnableCrawler.kt) at ai.platon.scent.crawl.AbstractRunnableStreamingCrawler.run$suspendImpl(AbstractRunnableStreamingCrawler.kt:24) at ai.platon.scent.crawl.AbstractRunnableStreamingCrawler.run(AbstractRunnableStreamingCrawler.kt) at ai.platon.scent.crawl.AbstractRunnableCrawler$run$1$1.invokeSuspend(AbstractRunnableCrawler.kt:22)...

swlcyx

Unable to get data when using a proxy

3

在不使用代理的情况下，`main` 分支代码可以正常运行在使用代理的情况下，总是不能正确的获取页面（持续很长时间都没有正确的爬取页面）爬取的日志总是（ 💯 🔃 S for RR got 200 2.64 KiB

sskmtm

How to match the extraction configuration when the web page is redirected

6

如果一个网页在获取后发生了重定向，有什么办法可以配置：extract-config.json 中的patttern 匹配重定向后的 url 呢？

sskmtm

good first issue

kotlin打包报错，大佬

8

![image](https://user-images.githubusercontent.com/72730341/204227819-e7ab3868-cb6c-49b0-9bc7-252f094d09c0.png)

loneboygan

good first issue

chrome devtools driver find 'typeError'

1

以下是错误日志： ```plain /usr/bin/google-chrome-stable --proxy-server=1.84.252.243:4231 --headless --disable-gpu --hide-scrollbars --remote-debugging-port=0 --no-default-browser-check --no-first-run --no-startup-window --mute-audio --disable-background-networking --disable-background-timer-throttling --disable-client-side-phishing-detection --disable-hang-monitor --disable-popup-blocking --disable-prompt-on-repost --disable-sync --disable-translate --disable-blink-features=AutomationControlled --metrics-recording-only --safebrowsing-disable-auto-update --no-sandbox --ignore-certificate-errors --window-size=1920,1080 --pageLoadStrategy=none --throwExceptionOnScriptError=true --user-data-dir=/tmp/pulsar-root/context/browser/br.66b305 21:17:11.427...

sskmtm

good first issue

wontfix

cycle crawl product reviews fail

1

循环爬取 /prudct-reviews/... 页面的内容，第一页爬取是正常的，当爬取到第二页的时候出了问题，爬取到的文件内容如下：请问，这种情况应该怎么解决（未使用代理）？

sskmtm

good first issue

wontfix

How to crawl search result

1

If I want to search for "iPad" on Amazon and crawl all the search results. What should I do?

wfh1300

The crawler only crawled down 2k products, not all of Amazon's products.

4

![de757c3e453c8aa75e360c311909869](https://user-images.githubusercontent.com/39584730/221144545-066f8543-b5e8-4a56-828c-4b15f8f4c3e3.png) Downloads are no longer changing at 2k. But Amazon's products are in the hundreds of millions.

wfh1300

good first issue

wontfix

Customer Reviews

2

Can this crawler crawl all consumer reviews? I only see the top-review folder, not the folder with all the reviews. ![063ac01d790830a7746e97a49823519](https://user-images.githubusercontent.com/39584730/221102001-7c11b1fb-4e57-4821-a353-8dfb0e7637f0.png)

wfh1300

good first issue

wontfix

Download Progress Check

1

How do I know how long I have to download? ![8c37ea662ba94790df95e4bf6e91273](https://user-images.githubusercontent.com/39584730/221091238-4b4ec177-5a2b-45c1-adc9-ef3cc7f50950.png)

wfh1300

good first issue

wontfix

exotic-amazon
exotic-amazon copied to clipboard

Metadata

unable to read task file

Unable to get data when using a proxy

How to match the extraction configuration when the web page is redirected

kotlin打包报错，大佬

chrome devtools driver find 'typeError'

cycle crawl product reviews fail

How to crawl search result

The crawler only crawled down 2k products, not all of Amazon's products.

Customer Reviews

Download Progress Check

← Metadata

Owner

Metadata

exotic-amazon exotic-amazon copied to clipboard

Metadata

← Metadata

Owner

Metadata

exotic-amazon
exotic-amazon copied to clipboard