exotic-amazon icon indicating copy to clipboard operation
exotic-amazon copied to clipboard

A complete solution to crawl amazon at scale completely and accurately.

Results 30 exotic-amazon issues
Sort by recently updated
recently updated
newest added

FileBackendStorage should be used to run demo tasks.

We need a better deployment experience. 1. Create a local directory that contains all necessary files to deploy. 2. We should test the program in the local deploy directory. 3....

你好,关于extract-config中各爬取任务父子级的关系,不知道是不是可以大概讲下。 我这边调整“列表页”-“商品详情页”以及“商品评论”的父子孙级关系后, 发现无论是否有父子级关系,AmazonJdbcSinkSQLExtractor.isRelevant都会重复创建多次对目标url进行判断,但是在有父子级关系的时候,反而会漏掉部分url。不会使用孙级的判断来对url进行匹配。

你好,今天运行代码,发现之前可运行的代码现在都报了Failed to create chrome devtools driver 这个错误,程序无法启动chrome进行拉取,以下为日志记录 `21:49:40.923 [r-worker-2] WARN a.p.p.p.b.e.context.WebDriverContext - 3. Retry task 1 in crawl scope | caused by: [Unexpected] Failed to create chrome devtools driver 21:49:41.057...

14:29:19.835 [r-worker-9] INFO a.p.p.c.component.LoadComponent.Task - 29745. 💔 ⚡ U for N got 1462 0

good first issue
wontfix

10:43:27.719 [r-worker-1] WARN a.p.e.a.c.b.c.AmazonGenerator - Unexpected exception java.nio.file.FileSystemNotFoundException: null at jdk.zipfs/jdk.nio.zipfs.ZipFileSystemProvider.getFileSystem(ZipFileSystemProvider.java:169) at jdk.zipfs/jdk.nio.zipfs.ZipFileSystemProvider.getPath(ZipFileSystemProvider.java:155) at java.base/java.nio.file.Path.of(Path.java:208) at java.base/java.nio.file.Paths.get(Paths.java:97) at ai.platon.exotic.amazon.crawl.boot.component.AmazonGenerator.getPeriodicalSeedDirectories(AmazonGenerator.kt:61) at ai.platon.exotic.amazon.crawl.boot.component.AmazonGenerator.generateLoadingTasks(AmazonGenerator.kt:111) at ai.platon.exotic.amazon.crawl.boot.component.AmazonGenerator.generateStartupTasks(AmazonGenerator.kt:85) at ai.platon.exotic.amazon.crawl.boot.component.AmazonCrawler.generate(AmazonCrawler.kt:53) at ai.platon.scent.crawl.AbstractRunnableCrawler.run0(AbstractRunnableCrawler.kt:49) at ai.platon.scent.crawl.AbstractRunnableCrawler.run$suspendImpl(AbstractRunnableCrawler.kt:29) at...

发现几种数据爬取失败时的日志,前2次失败了都会在几分钟后重试,第三次失败后,后面就不会重试了,也不会进入:isRelevant (true) -> onBeforeFilter -> onBeforeExtract -> extract -> onAfterExtract -> onAfterFilter 这个流程 请问: 1、失败三次就直接失败是框架的机制吗?还是说可以通过某些设置解决 2、有没有办法设置,或者代码操作的时候,让失败了还是可以进入:isRelevant (true) -> onBeforeFilter -> onBeforeExtract -> extract -> onAfterExtract -> onAfterFilter 这个流程,因为这样可以做一些后置(清除操作)处理 第一次失败: `Timeout...

good first issue
wontfix

https://www.yuque.com/g/kuloudadi/acseen/bl28so6x51ntz4lm/collaborator/join?token=NMF1sHp4XPlcpnB3# 邀请你共同编辑文档《柏拉图ai学习》

good first issue
wontfix

The web page is stuck. And The terminal occasionally prints debug messages. **_DEBUG a.p.s.r.a.schedule.ScentRestMonitor - Try executing top N tasks ..._** ![f50a9223a68608e876a32ebc4f0e67e](https://user-images.githubusercontent.com/39584730/221103552-74cc88eb-70ff-44ea-bbfc-0f5212068e9a.png)

good first issue
wontfix

Hi, I've set up jdbcommitter as required, and commented out the code in the configuration to load mongodb by default, but the default mode remains once the service is started

good first issue
wontfix