RuiJi.Net icon indicating copy to clipboard operation
RuiJi.Net copied to clipboard

macOS 下没有自动 Grab

Open RockNHawk opened this issue 7 years ago • 5 comments

作者你好!

经过对代码的兼容性修改(已发 pull request),macOS 目前已经可以通过点击 Actions 里的 Grab Now 按钮成功获取到数据了。

但是看下来没有自动 Grab,GrabResult 是空,Status 是 ON,Log 也是空,请教这个问题需要从何处查起呢?

是单机使用的。

还望指教一二,谢谢!

RockNHawk avatar Nov 21 '18 12:11 RockNHawk

Log 有了

Log 有了

2018-11-21 20:49:47,809 [1] INFO - 127.0.0.1:36000 feed scheduler starting 2018-11-21 20:49:47,819 [1] INFO - 127.0.0.1:36000 feed scheduler started 2018-11-21 20:49:47,821 [1] INFO - Start WebApiServer At http://127.0.0.1:36000 with STANDALONE node 2018-11-21 20:49:48,983 [4] INFO - 127.0.0.1:36000 add job with feed id 5 2018-11-21 20:49:48,996 [4] INFO - 127.0.0.1:36000 add job with feed id 3 2018-11-21 20:49:49,006 [4] INFO - 127.0.0.1:36000 add job with feed id 11 2018-11-21 20:49:49,013 [4] INFO - 127.0.0.1:36000 add job with feed id 1 2018-11-21 20:49:49,023 [4] INFO - 127.0.0.1:36000 add job with feed id 2 2018-11-21 20:49:49,032 [4] INFO - 127.0.0.1:36000 add job with feed id 4 2018-11-21 20:49:49,040 [4] INFO - 127.0.0.1:36000 add job with feed id 12 2018-11-21 20:49:49,040 [4] INFO - 127.0.0.1:36000 sync feed and add feed jobs:7 2018-11-21 20:49:49,042 [4] INFO - 127.0.0.1:36000 add extract job 2018-11-21 20:50:00,069 [14] INFO - feed job feed127.0.0.1:36000.3 add to feed crawl queue 2018-11-21 20:50:00,069 [13] INFO - feed job feed127.0.0.1:36000.12 add to feed crawl queue 2018-11-21 20:50:00,069 [15] INFO - feed job feed127.0.0.1:36000.2 add to feed crawl queue 2018-11-21 20:50:00,069 [5] INFO - feed job feed127.0.0.1:36000.11 add to feed crawl queue 2018-11-21 20:50:00,069 [9] INFO - feed job feed127.0.0.1:36000.1 add to feed crawl queue 2018-11-21 20:50:00,069 [4] INFO - feed job feed127.0.0.1:36000.5 add to feed crawl queue 2018-11-21 20:50:00,096 [4] INFO - feed job http://www.jiuxian.com/goods-55611.html?source=92 starting 2018-11-21 20:50:00,096 [15] INFO - feed job https://www.kuaidaili.com/free/inha/1/ starting 2018-11-21 20:50:00,096 [9] INFO - feed job https://www.oschina.net/blog starting 2018-11-21 20:50:00,096 [5] INFO - feed job http://www.ruijihg.com/爬虫 starting 2018-11-21 20:50:00,098 [4] INFO - do task -> request address http://www.jiuxian.com/goods-55611.html?source=92 2018-11-21 20:50:00,098 [15] INFO - do task -> request address https://www.kuaidaili.com/free/inha/1/ 2018-11-21 20:50:00,098 [9] INFO - do task -> request address https://www.oschina.net/blog 2018-11-21 20:50:00,098 [5] INFO - do task -> request address http://www.ruijihg.com/爬虫 2018-11-21 20:50:00,098 [13] INFO - begin move delay feed 2018-11-21 20:50:00,102 [13] INFO - get snapshot feed count:0 2018-11-21 20:50:00,104 [13] INFO - feed job http://press.gapp.gov.cn:8088/press_search/pages/query/queryAction!findmediaPaging.action starting 2018-11-21 20:50:00,104 [13] INFO - do task -> request address http://press.gapp.gov.cn:8088/press_search/pages/query/queryAction!findmediaPaging.action 2018-11-21 20:50:00,169 [4] INFO - request http://www.jiuxian.com/goods-55611.html?source=92 response code is BadRequest 2018-11-21 20:50:01,052 [16] INFO - feed job http://app.cannews.com.cn/roll.php?do=query&callback=jsonp1475197217819&=1542804600157&date=2018-11-21&size=20&page=1 starting 2018-11-21 20:50:01,053 [16] INFO - do task -> request address http://app.cannews.com.cn/roll.php?do=query&callback=jsonp1475197217819&=1542804600157&date=2018-11-21&size=20&page=1 2018-11-21 20:50:04,157 [5] INFO - request http://www.ruijihg.com/爬虫 response code is OK 2018-11-21 20:50:04,170 [5] INFO - http://www.ruijihg.com/爬虫 response save to /Users/user1/git/RuiJi.Net/RuiJi.Net.Cmd/bin/Debug/netcoreapp2.1/snapshot/1_636784302041708630.json 2018-11-21 20:50:04,262 [19] INFO - feed job http://app.cannews.com.cn/roll.php?do=query&callback=jsonp1475197217819&=1542804600157&date=2018-11-21&size=20&page=2 starting 2018-11-21 20:50:04,263 [19] INFO - do task -> request address http://app.cannews.com.cn/roll.php?do=query&callback=jsonp1475197217819&=1542804600157&date=2018-11-21&size=20&page=2 2018-11-21 20:50:04,461 [9] INFO - request https://www.oschina.net/blog response code is OK 2018-11-21 20:50:04,463 [9] INFO - https://www.oschina.net/blog response save to /Users/user/git/RuiJi.Net/RuiJi.Net.Cmd/bin/Debug/netcoreapp2.1/snapshot/5_636784302044636140.json 2018-11-21 20:50:04,832 [15] INFO - request https://www.kuaidaili.com/free/inha/1/ response code is OK 2018-11-21 20:50:04,833 [15] INFO - https://www.kuaidaili.com/free/inha/1/ response save to /Users/user1/git/RuiJi.Net/RuiJi.Net.Cmd/bin/Debug/netcoreapp2.1/snapshot/2_636784302048338500.json 2018-11-21 20:50:04,864 [16] INFO - request http://app.cannews.com.cn/roll.php?do=query&callback=jsonp1475197217819&=1542804600157&date=2018-11-21&size=20&page=1 response code is OK 2018-11-21 20:50:04,865 [16] INFO - http://app.cannews.com.cn/roll.php?do=query&callback=jsonp1475197217819&=1542804600157&date=2018-11-21&size=20&page=2 response save to /Users/user1/git/RuiJi.Net/RuiJi.Net.Cmd/bin/Debug/netcoreapp2.1/snapshot/3_636784302048652700.json 2018-11-21 20:50:04,946 [19] INFO - request http://app.cannews.com.cn/roll.php?do=query&callback=jsonp1475197217819&=1542804600157&date=2018-11-21&size=20&page=2 response code is OK 2018-11-21 20:50:04,947 [19] INFO - http://app.cannews.com.cn/roll.php?do=query&callback=jsonp1475197217819&=1542804600157&date=2018-11-21&size=20&page=2 response save to /Users/user1/git/RuiJi.Net/RuiJi.Net.Cmd/bin/Debug/netcoreapp2.1/snapshot/3_636784302049472390.json 2018-11-21 20:50:06,134 [13] INFO - request http://press.gapp.gov.cn:8088/press_search/pages/query/queryAction!findmediaPaging.action response code is OK 2018-11-21 20:50:06,143 [13] INFO - http://press.gapp.gov.cn:8088/press_search/pages/query/queryAction!findmediaPaging.action response save to /Users/user1/git/RuiJi.Net/RuiJi.Net.Cmd/bin/Debug/netcoreapp2.1/snapshot/11_636784302061430920.json

2018-11-21 20:51:07,090 [13] INFO - extract job http://www.cannews.com.cn/2018/1121/185471.shtml save result False 2018-11-21 20:51:07,090 [16] INFO - extract job http://www.cannews.com.cn/2018/1121/185469.shtml save result False

....

2018-11-21 20:52:00,005 [13] INFO - feed extract job execute 2018-11-21 20:52:00,006 [13] INFO - extract job started 2018-11-21 20:52:00,006 [13] INFO - begin move delay feed 2018-11-21 20:52:00,007 [9] INFO - get snapshot feed count:0 2018-11-21 20:53:00,004 [27] INFO - feed extract job execute 2018-11-21 20:53:00,005 [27] INFO - extract job started 2018-11-21 20:53:00,005 [27] INFO - begin move delay feed 2018-11-21 20:53:00,005 [13] INFO - get snapshot feed count:0

RockNHawk avatar Nov 21 '18 12:11 RockNHawk

这个是 Error Log,没有堆栈信息,应从何处查起呢?

2018-11-21 19:45:00,037 [20] ERROR - https://www.oschina.net/blog response error is Specified value has invalid Control characters. Parameter name: value 2018-11-21 19:45:00,037 [17] ERROR - http://www.ruijihg.com/爬虫 response error is Specified value has invalid Control characters. Parameter name: value 2018-11-21 20:22:16,762 [39] ERROR - http://www.cannews.com.cn/2018/1121/185448.shtml response error is A task may only be disposed if it is in a completion state (RanToCompletion, Faulted or Canceled). 2018-11-21 20:32:15,484 [43] ERROR - http://www.cannews.com.cn/2018/1121/185460.shtml response error is A task may only be disposed if it is in a completion state (RanToCompletion, Faulted or Canceled). 2018-11-21 20:37:33,384 [15] ERROR - http://www.jiuxian.com/goods-55611.html?source=92 response error is One or more errors occurred. (Failed to launch chrome! path to executable does not exist) 2018-11-21 20:45:00,070 [22] ERROR - http://www.jiuxian.com/goods-55611.html?source=92 response error is One or more errors occurred. (Failed to launch chrome! path to executable does not exist) 2018-11-21 20:50:00,166 [4] ERROR - http://www.jiuxian.com/goods-55611.html?source=92 response error is One or more errors occurred. (Failed to launch chrome! path to executable does not exist) 2018-11-21 20:55:00,020 [28] ERROR - http://www.jiuxian.com/goods-55611.html?source=92 response error is One or more errors occurred. (Failed to launch chrome! path to executable does not exist) 2018-11-21 21:00:00,215 [24] ERROR - http://www.jiuxian.com/goods-55611.html?source=92 response error is One or more errors occurred. (Failed to launch chrome! path to executable does not exist)

RockNHawk avatar Nov 21 '18 13:11 RockNHawk

你好,macOS因设备原因无法测试。“Failed to launch chrome! path to executable does not exist”此错误是该规则使用了RunJs但是没有配置好无头浏览器。

如果您需要运行页面上的js脚本,您需要安装chromium无头浏览器。 地址为 https://pan.baidu.com/s/1rsyCNnXxbobCBLZuPTiJHQ 访问密码 cr3e 下载RuiJi.Net所部署的操作系统对应的chromium的zip包 将运行文件解压至RuiJi.Net运行根目录中的chromium文件夹中,即可运行RunJs。

具体macOS使用chromium还要如何还要如何配置,请查阅一下相关资料。 以下为linux解决方法。 linux下需安装chromelib库 yum install chromium-libs.x86_64 并给与chromium文件夹最高权限 chmod -R 777 chromium

进行以上两步之后linux即可正常运行chromium无头浏览器,供参考。 https://gitee.com/zhupingqi/RuiJi.Net/wikis/%E5%85%B6%E4%BB%96?sort_id=580719 参考中文文档

githublixiang avatar Nov 22 '18 12:11 githublixiang

你好,macOS因设备原因无法测试。“Failed to launch chrome! path to executable does not exist”此错误是该规则使用了RunJs但是没有配置好无头浏览器。

如果您需要运行页面上的js脚本,您需要安装chromium无头浏览器。 地址为 https://pan.baidu.com/s/1rsyCNnXxbobCBLZuPTiJHQ 访问密码 cr3e 下载RuiJi.Net所部署的操作系统对应的chromium的zip包 将运行文件解压至RuiJi.Net运行根目录中的chromium文件夹中,即可运行RunJs。

具体macOS使用chromium还要如何还要如何配置,请查阅一下相关资料。 以下为linux解决方法。 linux下需安装chromelib库 yum install chromium-libs.x86_64 并给与chromium文件夹最高权限 chmod -R 777 chromium

进行以上两步之后linux即可正常运行chromium无头浏览器,供参考。 https://gitee.com/zhupingqi/RuiJi.Net/wikis/%E5%85%B6%E4%BB%96?sort_id=580719 参考中文文档

感谢回复!使用的项目中自带的数据做的测试,里面也有无需 RunJs 的项目,也没有 GrabResult 。

所以对于:

2018-11-21 20:22:16,762 [39] ERROR - http://www.cannews.com.cn/2018/1121/185448.shtml response error is A task may only be disposed if it is in a completion state (RanToCompletion, Faulted or

这类错误应从何查起呢?

RockNHawk avatar Nov 22 '18 12:11 RockNHawk

你好,macOS因设备原因无法测试。“Failed to launch chrome! path to executable does not exist”此错误是该规则使用了RunJs但是没有配置好无头浏览器。 如果您需要运行页面上的js脚本,您需要安装chromium无头浏览器。 地址为 https://pan.baidu.com/s/1rsyCNnXxbobCBLZuPTiJHQ 访问密码 cr3e 下载RuiJi.Net所部署的操作系统对应的chromium的zip包 将运行文件解压至RuiJi.Net运行根目录中的chromium文件夹中,即可运行RunJs。 具体macOS使用chromium还要如何还要如何配置,请查阅一下相关资料。 以下为linux解决方法。 linux下需安装chromelib库 yum install chromium-libs.x86_64 并给与chromium文件夹最高权限 chmod -R 777 chromium 进行以上两步之后linux即可正常运行chromium无头浏览器,供参考。 https://gitee.com/zhupingqi/RuiJi.Net/wikis/%E5%85%B6%E4%BB%96?sort_id=580719 参考中文文档

感谢回复!使用的项目中自带的数据做的测试,里面也有无需 RunJs 的项目,也没有 GrabResult 。

所以对于:

2018-11-21 20:22:16,762 [39] ERROR - http://www.cannews.com.cn/2018/1121/185448.shtml response error is A task may only be disposed if it is in a completion state (RanToCompletion, Faulted or

这类错误应从何查起呢?

你好,此条日志提示响应异常,打开此链接发现已经失效。 请检查需要提取的Feed及Rule是否设置正确。 请参照测试服务器FeedId为5的开源中国博客示例。 http://118.31.61.230:36000/#feed/feeds

githublixiang avatar Nov 22 '18 12:11 githublixiang