PulsarRPA
PulsarRPA copied to clipboard
Automate webpages at scale, scrape web data completely and accurately with high performance, distributed RPA.
疑问: - 是否支持linux服务器版进行部署采集? - 如支持,如何配置chrome?似乎没有找到教程。 谢谢。
Too many warning logs after MongoDB crashes: 10:56:03.617 [r-worker-5] WARN a.p.p.c.i.StreamingCrawler - [Unexpected] ai.platon.shaded.com.mongodb.MongoTimeoutException: Timed out after 30000 ms while waiting for a server that matches ai.platon.shaded.com.mongodb.client.internal.MongoClientDeleg ate$1@7335a5ec. Client view...
18:56:56.985 [main] WARN a.p.pulsar.dom.select.PowerSelector - Failed to parse css query | #productDescription, h2:contains(Product Description) --x-- div | https://www.amazon.com/dp/B07V2CLJLV | Could not parse query '--x--': unexpected token at '--x--'
For example, we construct a page URL does not exist: https://www.amazon.com/dp/006323047_404. PulsarR has to properly handle such pages: 1. properly handle the status code 2. do not retry
Original report: The proxy is expired (xxx), context reset will be triggered https://github.com/platonai/exotic-amazon/issues/19 This bug has already be fixed in 1.10.x.
bin/build-run.sh // OK 2023-02-16 22:12:09.736 INFO [main] a.p.p.a.m.PulsarMasterKt - Starting PulsarMasterKt v1.10.10-SNAPSHOT using Java 17.0.5 on regulus with PID 21576 (/home/vincent/workspace/pulsar-1.10.x/pulsar-app/pulsar-master/target/pulsar-master-1.10.10-SNAPSHOT.jar started by vincent in /home/vincent/workspace/pulsar-1.10.x) And then we issue...
aliyunmaven central 阿里云公共仓库 https://maven.aliyun.com/repository/public spring central spring公共仓库 https://maven.aliyun.com/repository/spring repo central Human Readable Name for this Mirror. https://repo.maven.apache.org/maven2/ repo2 central Human Readable Name for this Mirror. https://oss.sonatype.org/#stagingRepositories repo3 central Human Readable...
使用最新的Google Chrome时: 使用正常标题的google-chrome浏览器时: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.69 Safari/537.36 使用google-chrome-headless浏览器时: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/95.0.4638.69 Safari/537.36
Reading from FileBackendPageStore failed. Exception in thread "main" java.nio.file.FileSystemException: C:\Users\Vincent Zhang\.pulsar\data\store\nbzfcg-cn\nbzfcg-cn-5fb8f1e5b8322a31bb42dbdcee9d256f.avro: 另一个程序正在使用此文件,进程无法访问。 at java.base/sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:92) at java.base/sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:103) at java.base/sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:108) at java.base/sun.nio.fs.WindowsFileSystemProvider.implDelete(WindowsFileSystemProvider.java:274) at java.base/sun.nio.fs.AbstractFileSystemProvider.deleteIfExists(AbstractFileSystemProvider.java:110) at java.base/java.nio.file.Files.deleteIfExists(Files.java:1185) at ai.platon.pulsar.persist.gora.FileBackendPageStore.readAvro(FileBackendPageStore.kt:98) at ai.platon.pulsar.persist.gora.FileBackendPageStore.get(FileBackendPageStore.kt:41) at ai.platon.pulsar.persist.gora.FileBackendPageStore.get(FileBackendPageStore.kt:30)...
2022-10-15 14:44:12.146 WARN [-worker-12] a.p.p.p.b.e.i.BrowserEmulatorImplBase - java.nio.file.FileSystemException: C:\Users\VINCEN~1\AppData\Local\Temp\ln\5a6caaaaa8aaf6e230182a2bbad7c43c.htm: 客户端没有所需的特权。 Environment: OS: Windows 11 JDK: Java 11 Commnad line: "C:\Program Files\Java\jdk-11.0.2\bin\java.exe" "-javaagent:D:\Program Files\JetBrains\IntelliJ IDEA 2022.1.3\lib\idea_rt.jar=61275:D:\Program Files\JetBrains\IntelliJ IDEA 2022.1.3\bin" -Dfile.encoding=UTF-8 -classpath "..."...