PulsarRPA icon indicating copy to clipboard operation
PulsarRPA copied to clipboard

Automate webpages at scale, scrape web data completely and accurately with high performance, distributed RPA.

Results 47 PulsarRPA issues
Sort by recently updated
recently updated
newest added

疑问: - 是否支持linux服务器版进行部署采集? - 如支持,如何配置chrome?似乎没有找到教程。 谢谢。

good first issue
wontfix

Too many warning logs after MongoDB crashes: 10:56:03.617 [r-worker-5] WARN a.p.p.c.i.StreamingCrawler - [Unexpected] ai.platon.shaded.com.mongodb.MongoTimeoutException: Timed out after 30000 ms while waiting for a server that matches ai.platon.shaded.com.mongodb.client.internal.MongoClientDeleg ate$1@7335a5ec. Client view...

18:56:56.985 [main] WARN a.p.pulsar.dom.select.PowerSelector - Failed to parse css query | #productDescription, h2:contains(Product Description) --x-- div | https://www.amazon.com/dp/B07V2CLJLV | Could not parse query '--x--': unexpected token at '--x--'

For example, we construct a page URL does not exist: https://www.amazon.com/dp/006323047_404. PulsarR has to properly handle such pages: 1. properly handle the status code 2. do not retry

Original report: The proxy is expired (xxx), context reset will be triggered https://github.com/platonai/exotic-amazon/issues/19 This bug has already be fixed in 1.10.x.

bin/build-run.sh // OK 2023-02-16 22:12:09.736 INFO [main] a.p.p.a.m.PulsarMasterKt - Starting PulsarMasterKt v1.10.10-SNAPSHOT using Java 17.0.5 on regulus with PID 21576 (/home/vincent/workspace/pulsar-1.10.x/pulsar-app/pulsar-master/target/pulsar-master-1.10.10-SNAPSHOT.jar started by vincent in /home/vincent/workspace/pulsar-1.10.x) And then we issue...

aliyunmaven central 阿里云公共仓库 https://maven.aliyun.com/repository/public spring central spring公共仓库 https://maven.aliyun.com/repository/spring repo central Human Readable Name for this Mirror. https://repo.maven.apache.org/maven2/ repo2 central Human Readable Name for this Mirror. https://oss.sonatype.org/#stagingRepositories repo3 central Human Readable...

good first issue
wontfix

使用最新的Google Chrome时: 使用正常标题的google-chrome浏览器时: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.69 Safari/537.36 使用google-chrome-headless浏览器时: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/95.0.4638.69 Safari/537.36

Reading from FileBackendPageStore failed. Exception in thread "main" java.nio.file.FileSystemException: C:\Users\Vincent Zhang\.pulsar\data\store\nbzfcg-cn\nbzfcg-cn-5fb8f1e5b8322a31bb42dbdcee9d256f.avro: 另一个程序正在使用此文件,进程无法访问。 at java.base/sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:92) at java.base/sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:103) at java.base/sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:108) at java.base/sun.nio.fs.WindowsFileSystemProvider.implDelete(WindowsFileSystemProvider.java:274) at java.base/sun.nio.fs.AbstractFileSystemProvider.deleteIfExists(AbstractFileSystemProvider.java:110) at java.base/java.nio.file.Files.deleteIfExists(Files.java:1185) at ai.platon.pulsar.persist.gora.FileBackendPageStore.readAvro(FileBackendPageStore.kt:98) at ai.platon.pulsar.persist.gora.FileBackendPageStore.get(FileBackendPageStore.kt:41) at ai.platon.pulsar.persist.gora.FileBackendPageStore.get(FileBackendPageStore.kt:30)...

2022-10-15 14:44:12.146 WARN [-worker-12] a.p.p.p.b.e.i.BrowserEmulatorImplBase - java.nio.file.FileSystemException: C:\Users\VINCEN~1\AppData\Local\Temp\ln\5a6caaaaa8aaf6e230182a2bbad7c43c.htm: 客户端没有所需的特权。 Environment: OS: Windows 11 JDK: Java 11 Commnad line: "C:\Program Files\Java\jdk-11.0.2\bin\java.exe" "-javaagent:D:\Program Files\JetBrains\IntelliJ IDEA 2022.1.3\lib\idea_rt.jar=61275:D:\Program Files\JetBrains\IntelliJ IDEA 2022.1.3\bin" -Dfile.encoding=UTF-8 -classpath "..."...