EasySpider icon indicating copy to clipboard operation
EasySpider copied to clipboard

在Mac上设计流程,成功爬取数据后,尝试在Linux服务器上用docker部署执行(多任务的方式)

Open JasonChen-ecnu opened this issue 1 year ago • 2 comments

任务是带用户信息的,爬取到的数据写入mysql,以下是我的操作: 1、下载了EasySpider_0.6.2_Linux_x64_with_docker_support.tar.xz,在服务器解压,得到EasySpider_Linux_x64。 2、将Mac上的 config.json、mysql_config.json 和 user_data文件夹,上传到服务器 EasySpider_Linux_x64 文件夹下。 3、在服务器修改了 config.json 文件的内容 : "absolute_user_data_folder":"/export/EasySpider_Linux_x64/user_data" "sys_arch":"amd64" 4、将Mac上的 tasks 文件夹下的文件 1.json 上传到服务器 /export/EasySpider_Linux_x64/tasks 目录下。 5、将Mac上的 execution_instances 文件夹下的文件 55.json 上传到服务器 /export/EasySpider_Linux_x64/execution_instances 目录下。 6、在服务器执行: sudo docker network create grid sudo docker run -d -p 4442-4444:4442-4444 --net grid --name selenium-hub selenium/hub sudo docker run -d --net grid -e SE_EVENT_BUS_HOST=selenium-hub
--shm-size="4g"
-e SE_EVENT_BUS_PUBLISH_PORT=4442
-e SE_EVENT_BUS_SUBSCRIBE_PORT=4443
selenium/node-chrome 截屏2024-08-23 16 10 19 7、在服务器 EasySpider_Linux_x64 目录下,执行:./EasySpider/resources/app/chrome_linux64/easyspider_executestage --ids [55] --docker_driver http://localhost:4444/wd/hub --user_data 1 --server_address http://localhost:8074 --config_folder "/export/EasySpider_Linux_x64" --headless 1 --read_type local --config_file_name config.json --saved_file_name

报错: id: 55 local Task Name: 巨量引擎工作台直播列表 任务名称: 巨量引擎工作台直播列表 文件下载路径|File Download path: /export/EasySpider_Linux_x64/Data/Task_55/files Using remote driver Headless mode Traceback (most recent call last): File "easyspider_executestage.py", line 2458, in File "myChrome.py", line 30, in init File "selenium/webdriver/remote/webdriver.py", line 209, in init File "selenium/webdriver/remote/webdriver.py", line 293, in start_session File "selenium/webdriver/remote/webdriver.py", line 348, in execute File "selenium/webdriver/remote/errorhandler.py", line 229, in check_response selenium.common.exceptions.SessionNotCreatedException: Message: Could not start a new session. Could not start a new session. Error while creating session with the driver service. Stopping driver service: Could not start a new session. Response code 500. Message: session not created from unknown error: cannot create default profile directory Host info: host: '19fb995e6e76', ip: '172.18.0.3' Build info: version: '4.23.1', revision: '656257d8e9' System info: os.name: 'Linux', os.arch: 'amd64', os.version: '4.18.0-147.el8.x86_64', java.version: '17.0.12' Driver info: driver.version: unknown Build info: version: '4.23.1', revision: '656257d8e9' System info: os.name: 'Linux', os.arch: 'amd64', os.version: '4.18.0-147.el8.x86_64', java.version: '17.0.12' Driver info: driver.version: unknown Build info: version: '4.23.1', revision: '656257d8e9' System info: os.name: 'Linux', os.arch: 'amd64', os.version: '4.18.0-147.el8.x86_64', java.version: '17.0.12' Driver info: driver.version: unknown Stacktrace: at org.openqa.selenium.grid.node.remote.RemoteNode.newSession (RemoteNode.java:157) at org.openqa.selenium.grid.distributor.local.LocalDistributor.startSession (LocalDistributor.java:654) at org.openqa.selenium.grid.distributor.local.LocalDistributor.newSession (LocalDistributor.java:573) at org.openqa.selenium.grid.distributor.local.LocalDistributor$NewSessionRunnable.handleNewSessionRequest (LocalDistributor.java:836) at org.openqa.selenium.grid.distributor.local.LocalDistributor$NewSessionRunnable.lambda$run$1 (LocalDistributor.java:793) at java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1136) at java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:635) at java.lang.Thread.run (Thread.java:840) [7389] Failed to execute script 'easyspider_executestage' due to unhandled exception!

JasonChen-ecnu avatar Aug 23 '24 08:08 JasonChen-ecnu

我直接在ubuntu上按照步骤安装,但是执行不起来:2024-10-24 15:45:08.858595276 [W:onnxruntime:Default, onnxruntime_pybind_state.cc:1983 CreateInferencePybindStateModule] Init provider bridge failed.

Configurations: +------------------+------+------------------------------+ | Key | Type | Value | +------------------+------+------------------------------+ | ids | list | [0] | | saved_file_name | str | | | user_data | bool | False | | config_folder | str | ./ | | config_file_name | str | config.json | | read_type | str | local | | headless | bool | True | | server_address | str | http://localhost:8074 | | keyboard | bool | True | | pause_key | str | p | | version | str | 0.6.2 | | docker_driver | str | http://localhost:4444/wd/hub | +------------------+------+------------------------------+

linux ('64bit', 'ELF') Finding chromedriver in EasySpider /home/qinxizhou/firecrawl/EasySpider_Linux_x64/EasySpider Chrome location: EasySpider/resources/app/chrome_linux64/chrome Chromedriver location: EasySpider/resources/app/chrome_linux64/chromedriver_linux64 Headless mode 无头模式 如果报错Selenium.common.exceptions.WebDriverException: Message: unknown error: Chrome failed to start: exited abnormally,说明有之前运行的Chrome实例没有正常关闭,请关闭之前打开的所有Chrome实例后再运行程序即可。 If you get an error Selenium.common.exceptions.WebDriverException: Message: unknown error: Chrome failed to start: exited abnormally, it means that there is a Chrome instance that was not closed properly before, please close all Chrome instances that were opened before running the program. id: 0 local Task Name: Electronics, Cars, Fashion, Collectibles & More | eBay 任务名称: Electronics, Cars, Fashion, Collectibles & More | eBay 文件下载路径|File Download path: /home/qinxizhou/firecrawl/EasySpider_Linux_x64/Data/Task_0/files Using remote driver Headless mode Traceback (most recent call last): File "urllib3/connection.py", line 203, in _new_conn File "urllib3/util/connection.py", line 85, in create_connection File "urllib3/util/connection.py", line 73, in create_connection ConnectionRefusedError: [Errno 111] Connection refused

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "urllib3/connectionpool.py", line 790, in urlopen File "urllib3/connectionpool.py", line 496, in _make_request File "urllib3/connection.py", line 395, in request File "http/client.py", line 1281, in endheaders File "http/client.py", line 1041, in _send_output File "http/client.py", line 979, in send File "urllib3/connection.py", line 243, in connect File "urllib3/connection.py", line 218, in _new_conn urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7f95c804be10>: Failed to establish a new connection: [Errno 111] Connection refused

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "easyspider_executestage.py", line 2458, in File "myChrome.py", line 30, in init File "selenium/webdriver/remote/webdriver.py", line 209, in init File "selenium/webdriver/remote/webdriver.py", line 293, in start_session File "selenium/webdriver/remote/webdriver.py", line 346, in execute File "selenium/webdriver/remote/remote_connection.py", line 300, in execute File "selenium/webdriver/remote/remote_connection.py", line 321, in _request File "urllib3/_request_methods.py", line 118, in request File "urllib3/_request_methods.py", line 217, in request_encode_body File "urllib3/poolmanager.py", line 444, in urlopen File "urllib3/connectionpool.py", line 874, in urlopen File "urllib3/connectionpool.py", line 874, in urlopen File "urllib3/connectionpool.py", line 874, in urlopen File "urllib3/connectionpool.py", line 844, in urlopen File "urllib3/util/retry.py", line 515, in increment urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='localhost', port=4444): Max retries exceeded with url: /wd/hub/session (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f95c804be10>: Failed to establish a new connection: [Errno 111] Connection refused')) [35299] Failed to execute script 'easyspider_executestage' due to unhandled exception!

zhukefucn avatar Oct 24 '24 07:10 zhukefucn

image 单任务部署,docker环境下 也报错,提示 : 有之前运行的Chrome实例没有正常关闭,请关闭之前打开的所有Chrome实例后再运行程序即可。 我这边 docker restart selenium-chrome 也不起作用。

gdyxml2000 avatar Nov 14 '24 03:11 gdyxml2000