PulsarRPAPro icon indicating copy to clipboard operation
PulsarRPAPro copied to clipboard

Harvest failure caused by non-standard CSS

Open platonai opened this issue 10 months ago • 0 comments

Should handle SelectorParseException correctly.

2025-02-09 15:21:39.747 INFO [main] a.p.s.a.c.AnalysablePageCorpus - Arrangement finished - G1 Eden Space: init = 25165824(24576K) used = 67108864(65536K) committed = 197132288(192512K) max = -1(-1K), G1 Old Gen: init = 511705088(499712K) used = 566052440(552785K) committed = 834666496(815104K) max = 8543797248(8343552K), G1 Survivor Space: init = 0(0K) used = 33554432(32768K) committed = 33554432(32768K) max = -1(-1K) 2025-02-09 15:21:39.754 INFO [main] a.p.s.a.AutoMiner - Round #1 find 40/40/229 documents/urls/anchors in group #a18ad07fe4905bdfe88215ee9b53c246[a > https://www.amazon.com] with score <29,9,1000,-1,0,0,-229,229,888,20,0,666> 2025-02-09 15:21:39.755 WARN [main] a.p.s.a.AutoMiner - Harvest task takes long time(PT4M27.9892403S) | https://www.amazon.com/b?node=1292115011 -diagnose -nJitRetry 1 -refresh -topLinks 40 2025-02-09 15:21:39.756 INFO [main] a.p.s.a.c.AnalysablePageCorpus - Analysis start - G1 Eden Space: init = 25165824(24576K) used = 67108864(65536K) committed = 197132288(192512K) max = -1(-1K), G1 Old Gen: init = 511705088(499712K) used = 566052440(552785K) committed = 834666496(815104K) max = 8543797248(8343552K), G1 Survivor Space: init = 0(0K) used = 33554432(32768K) committed = 33554432(32768K) max = -1(-1K) Exception in thread "main" java.lang.reflect.InvocationTargetException at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:118) at java.base/java.lang.reflect.Method.invoke(Method.java:580) at org.springframework.boot.loader.MainMethodRunner.run(MainMethodRunner.java:49) at org.springframework.boot.loader.Launcher.launch(Launcher.java:108) at org.springframework.boot.loader.Launcher.launch(Launcher.java:58) at org.springframework.boot.loader.JarLauncher.main(JarLauncher.java:88) Caused by: org.jsoup.select.Selector$SelectorParseException: Could not parse query '#p_n_feature_seven_browse-bin/23991554011': unexpected token at '/23991554011' at org.jsoup.select.QueryParser.findElements(QueryParser.java:218) at org.jsoup.select.QueryParser.parse(QueryParser.java:74) at org.jsoup.select.QueryParser.parse(QueryParser.java:45) at org.jsoup.select.Selector.selectFirst(Selector.java:162) at org.jsoup.nodes.Element.selectFirst(Element.java:445) at ai.platon.scent.analysis.corpus.AnalysablePageCorpus.supplementPartition(AnalysablePageCorpus.kt:906) at ai.platon.scent.analysis.corpus.AnalysablePageCorpus.partition(AnalysablePageCorpus.kt:842) at ai.platon.scent.analysis.corpus.AnalysablePageCorpus.calculateAdvancedFeatures(AnalysablePageCorpus.kt:285) at ai.platon.scent.analysis.corpus.AnalysablePageCorpus.calculateAdvancedFeatures$default(AnalysablePageCorpus.kt:259) at ai.platon.scent.analysis.corpus.AnalysablePageCorpus.analyse(AnalysablePageCorpus.kt:210) at ai.platon.scent.analysis.AutoMiner.mine(AutoMiner.kt:278) at ai.platon.scent.dm.HarvestRunner.harvest(HarvestRunner.kt:162) at ai.platon.scent.dm.HarvestRunner$harvest$3.invokeSuspend(HarvestRunner.kt) at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33) at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:106) at kotlinx.coroutines.EventLoopImplBase.processNextEvent(EventLoop.common.kt:274) at kotlinx.coroutines.BlockingCoroutine.joinBlocking(Builders.kt:85) at kotlinx.coroutines.BuildersKt__BuildersKt.runBlocking(Builders.kt:59) at kotlinx.coroutines.BuildersKt.runBlocking(Unknown Source) at kotlinx.coroutines.BuildersKt__BuildersKt.runBlocking$default(Builders.kt:38) at kotlinx.coroutines.BuildersKt.runBlocking$default(Unknown Source) at ai.platon.exotic.standalone.common.VerboseHarvester.harvest(VerboseHarvester.kt:110) at ai.platon.exotic.standalone.common.VerboseHarvester.harvest(VerboseHarvester.kt:98) at ai.platon.exotic.standalone.common.VerboseHarvester.harvest(VerboseHarvester.kt:96) at ai.platon.exotic.standalone.starter.ExoticExecutor$harvest$1.invokeSuspend(ExoticExecutor.kt:232) at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33) at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:106) at kotlinx.coroutines.EventLoopImplBase.processNextEvent(EventLoop.common.kt:274) at kotlinx.coroutines.BlockingCoroutine.joinBlocking(Builders.kt:85) at kotlinx.coroutines.BuildersKt__BuildersKt.runBlocking(Builders.kt:59) at kotlinx.coroutines.BuildersKt.runBlocking(Unknown Source) at kotlinx.coroutines.BuildersKt__BuildersKt.runBlocking$default(Builders.kt:38) at kotlinx.coroutines.BuildersKt.runBlocking$default(Unknown Source) at ai.platon.exotic.standalone.starter.ExoticExecutor.harvest$exotic_standalone(ExoticExecutor.kt:231) at ai.platon.exotic.standalone.starter.ExoticExecutor.execute(ExoticExecutor.kt:87) at ai.platon.exotic.standalone.starter.ExoticStandaloneStarterKt.main(ExoticStandaloneStarter.kt:10) at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103) ... 5 more

XXXX@DESKTOP-1HFTEKE MINGW64 ~/Desktop/bolatu

platonai avatar Feb 09 '25 08:02 platonai