exotic-amazon
exotic-amazon copied to clipboard
A complete solution to crawl amazon at scale completely and accurately.
FileBackendStorage should be used to run demo tasks.
We need a better deployment experience. 1. Create a local directory that contains all necessary files to deploy. 2. We should test the program in the local deploy directory. 3....
你好,关于extract-config中各爬取任务父子级的关系,不知道是不是可以大概讲下。 我这边调整“列表页”-“商品详情页”以及“商品评论”的父子孙级关系后, 发现无论是否有父子级关系,AmazonJdbcSinkSQLExtractor.isRelevant都会重复创建多次对目标url进行判断,但是在有父子级关系的时候,反而会漏掉部分url。不会使用孙级的判断来对url进行匹配。
你好,今天运行代码,发现之前可运行的代码现在都报了Failed to create chrome devtools driver 这个错误,程序无法启动chrome进行拉取,以下为日志记录 `21:49:40.923 [r-worker-2] WARN a.p.p.p.b.e.context.WebDriverContext - 3. Retry task 1 in crawl scope | caused by: [Unexpected] Failed to create chrome devtools driver 21:49:41.057...
14:29:19.835 [r-worker-9] INFO a.p.p.c.component.LoadComponent.Task - 29745. 💔 ⚡ U for N got 1462 0
10:43:27.719 [r-worker-1] WARN a.p.e.a.c.b.c.AmazonGenerator - Unexpected exception java.nio.file.FileSystemNotFoundException: null at jdk.zipfs/jdk.nio.zipfs.ZipFileSystemProvider.getFileSystem(ZipFileSystemProvider.java:169) at jdk.zipfs/jdk.nio.zipfs.ZipFileSystemProvider.getPath(ZipFileSystemProvider.java:155) at java.base/java.nio.file.Path.of(Path.java:208) at java.base/java.nio.file.Paths.get(Paths.java:97) at ai.platon.exotic.amazon.crawl.boot.component.AmazonGenerator.getPeriodicalSeedDirectories(AmazonGenerator.kt:61) at ai.platon.exotic.amazon.crawl.boot.component.AmazonGenerator.generateLoadingTasks(AmazonGenerator.kt:111) at ai.platon.exotic.amazon.crawl.boot.component.AmazonGenerator.generateStartupTasks(AmazonGenerator.kt:85) at ai.platon.exotic.amazon.crawl.boot.component.AmazonCrawler.generate(AmazonCrawler.kt:53) at ai.platon.scent.crawl.AbstractRunnableCrawler.run0(AbstractRunnableCrawler.kt:49) at ai.platon.scent.crawl.AbstractRunnableCrawler.run$suspendImpl(AbstractRunnableCrawler.kt:29) at...
发现几种数据爬取失败时的日志,前2次失败了都会在几分钟后重试,第三次失败后,后面就不会重试了,也不会进入:isRelevant (true) -> onBeforeFilter -> onBeforeExtract -> extract -> onAfterExtract -> onAfterFilter 这个流程 请问: 1、失败三次就直接失败是框架的机制吗?还是说可以通过某些设置解决 2、有没有办法设置,或者代码操作的时候,让失败了还是可以进入:isRelevant (true) -> onBeforeFilter -> onBeforeExtract -> extract -> onAfterExtract -> onAfterFilter 这个流程,因为这样可以做一些后置(清除操作)处理 第一次失败: `Timeout...
一起学习
https://www.yuque.com/g/kuloudadi/acseen/bl28so6x51ntz4lm/collaborator/join?token=NMF1sHp4XPlcpnB3# 邀请你共同编辑文档《柏拉图ai学习》
The web page is stuck. And The terminal occasionally prints debug messages. **_DEBUG a.p.s.r.a.schedule.ScentRestMonitor - Try executing top N tasks ..._** 
Hi, I've set up jdbcommitter as required, and commented out the code in the configuration to load mongodb by default, but the default mode remains once the service is started