在Mac上设计流程,成功爬取数据后,尝试在Linux服务器上用docker部署执行(多任务的方式)
任务是带用户信息的,爬取到的数据写入mysql,以下是我的操作:
1、下载了EasySpider_0.6.2_Linux_x64_with_docker_support.tar.xz,在服务器解压,得到EasySpider_Linux_x64。
2、将Mac上的 config.json、mysql_config.json 和 user_data文件夹,上传到服务器 EasySpider_Linux_x64 文件夹下。
3、在服务器修改了 config.json 文件的内容 :
"absolute_user_data_folder":"/export/EasySpider_Linux_x64/user_data"
"sys_arch":"amd64"
4、将Mac上的 tasks 文件夹下的文件 1.json 上传到服务器 /export/EasySpider_Linux_x64/tasks 目录下。
5、将Mac上的 execution_instances 文件夹下的文件 55.json 上传到服务器 /export/EasySpider_Linux_x64/execution_instances 目录下。
6、在服务器执行:
sudo docker network create grid
sudo docker run -d -p 4442-4444:4442-4444 --net grid --name selenium-hub selenium/hub
sudo docker run -d --net grid -e SE_EVENT_BUS_HOST=selenium-hub
--shm-size="4g"
-e SE_EVENT_BUS_PUBLISH_PORT=4442
-e SE_EVENT_BUS_SUBSCRIBE_PORT=4443
selenium/node-chrome
7、在服务器 EasySpider_Linux_x64 目录下,执行:./EasySpider/resources/app/chrome_linux64/easyspider_executestage --ids [55] --docker_driver http://localhost:4444/wd/hub --user_data 1 --server_address http://localhost:8074 --config_folder "/export/EasySpider_Linux_x64" --headless 1 --read_type local --config_file_name config.json --saved_file_name
报错:
id: 55
local
Task Name: 巨量引擎工作台直播列表
任务名称: 巨量引擎工作台直播列表
文件下载路径|File Download path: /export/EasySpider_Linux_x64/Data/Task_55/files
Using remote driver
Headless mode
Traceback (most recent call last):
File "easyspider_executestage.py", line 2458, in
我直接在ubuntu上按照步骤安装,但是执行不起来:2024-10-24 15:45:08.858595276 [W:onnxruntime:Default, onnxruntime_pybind_state.cc:1983 CreateInferencePybindStateModule] Init provider bridge failed.
Configurations: +------------------+------+------------------------------+ | Key | Type | Value | +------------------+------+------------------------------+ | ids | list | [0] | | saved_file_name | str | | | user_data | bool | False | | config_folder | str | ./ | | config_file_name | str | config.json | | read_type | str | local | | headless | bool | True | | server_address | str | http://localhost:8074 | | keyboard | bool | True | | pause_key | str | p | | version | str | 0.6.2 | | docker_driver | str | http://localhost:4444/wd/hub | +------------------+------+------------------------------+
linux ('64bit', 'ELF') Finding chromedriver in EasySpider /home/qinxizhou/firecrawl/EasySpider_Linux_x64/EasySpider Chrome location: EasySpider/resources/app/chrome_linux64/chrome Chromedriver location: EasySpider/resources/app/chrome_linux64/chromedriver_linux64 Headless mode 无头模式 如果报错Selenium.common.exceptions.WebDriverException: Message: unknown error: Chrome failed to start: exited abnormally,说明有之前运行的Chrome实例没有正常关闭,请关闭之前打开的所有Chrome实例后再运行程序即可。 If you get an error Selenium.common.exceptions.WebDriverException: Message: unknown error: Chrome failed to start: exited abnormally, it means that there is a Chrome instance that was not closed properly before, please close all Chrome instances that were opened before running the program. id: 0 local Task Name: Electronics, Cars, Fashion, Collectibles & More | eBay 任务名称: Electronics, Cars, Fashion, Collectibles & More | eBay 文件下载路径|File Download path: /home/qinxizhou/firecrawl/EasySpider_Linux_x64/Data/Task_0/files Using remote driver Headless mode Traceback (most recent call last): File "urllib3/connection.py", line 203, in _new_conn File "urllib3/util/connection.py", line 85, in create_connection File "urllib3/util/connection.py", line 73, in create_connection ConnectionRefusedError: [Errno 111] Connection refused
The above exception was the direct cause of the following exception:
Traceback (most recent call last): File "urllib3/connectionpool.py", line 790, in urlopen File "urllib3/connectionpool.py", line 496, in _make_request File "urllib3/connection.py", line 395, in request File "http/client.py", line 1281, in endheaders File "http/client.py", line 1041, in _send_output File "http/client.py", line 979, in send File "urllib3/connection.py", line 243, in connect File "urllib3/connection.py", line 218, in _new_conn urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7f95c804be10>: Failed to establish a new connection: [Errno 111] Connection refused
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "easyspider_executestage.py", line 2458, in
单任务部署,docker环境下 也报错,提示 : 有之前运行的Chrome实例没有正常关闭,请关闭之前打开的所有Chrome实例后再运行程序即可。 我这边 docker restart selenium-chrome 也不起作用。