About crawlab.json
When I was working with the SDK, I found that the SDK was not very convenient for schedules and deployment of multiple spiders, so I wondered if it could be designed to look like the following
.
| ── packages
│ | ── js_spiders
│ | | ── js_spider_1
│ | | | ── index.js
│ | | ── js_spider_2
│ | | | ── index.js
│ | | ── package.json
│ | | ── .....
│ | ── py_spiders
│ | | ── py_spider_1
│ | | | ── main.py
│ | | ── py_spider_2
│ | | | ── main.py
│ | | ── setup.py
│ | | ── .....
│ ── crawlab.json
│ ── makefile
crawlab.json
{
"spiders": [
{
"path": "packages/js_spider",
"exclude_path": "node_modules",
"name": "js spiders",
"description": "js spiders",
"cmd": "node",
"schedules": [
{
"name": "js spider 1 cron",
"cron": "* 1 * * *",
"command": "node js_spider_1/index.js",
"param": "",
"mode": "random",
"description": "js spider 1 cron",
"enabled": true
},
{
"name": "js spider 2 cron",
"cron": "* 2 * * *",
"command": "node js_spider_2/index.js",
"param": "",
"mode": "random",
"description": "js spider 2 cron",
"enabled": true
}
]
},
{
"path": "packages/py_spider",
"exclude_path": ".venv",
"name": "py spiders",
"description": "py spiders",
"cmd": "python",
"schedules": [
{
"name": "py spider 1 cron",
"cron": "* 1 * * *",
"command": "python py_spider_1/main.py",
"param": "",
"mode": "random",
"description": "py spider 1 cron",
"enabled": true
},
{
"name": "py spider 2 cron",
"cron": "* 2 * * *",
"command": "python py_spider_2/main.py",
"param": "",
"mode": "random",
"description": "py spider 2 cron",
"enabled": true
}
]
}
]
}
I can help implement this if you think it is possible @tikazyq
Multi-spider support is on the way. Please follow this issue https://github.com/crawlab-team/crawlab/issues/1190
Multi-spider support is on the way. Please follow this issue crawlab-team/crawlab#1190
Will schedules deployments also be included?
Would you elaborate a bit?
Would you elaborate a bit?
In practice, I need to create dozens of new cronjobs along with a new crawler spider, crawler spider upload can be done from the command line, so can cronjobs be done too? then I can write these commands to CICD.
So I would like to add a new param schedules to the crawlab.json to publish and manage cronjobs, like this
{
"path": "packages/py_spider",
"exclude_path": ".venv",
"name": "py spiders",
"description": "py spiders",
"cmd": "python",
"schedules": [
{
"name": "py spider 1 cron",
"cron": "* 1 * * *",
"command": "python py_spider_1/main.py",
"param": "",
"mode": "random",
"description": "py spider 1 cron",
"enabled": true
},
{
"name": "py spider 2 cron",
"cron": "* 2 * * *",
"command": "python py_spider_2/main.py",
"param": "",
"mode": "random",
"description": "py spider 2 cron",
"enabled": true
},
...
]
}
what do you think of these ideas, or do you have any other better suggestions?
Would you elaborate a bit?
@tikazyq What do you think about the above
I think that's a good idea but it might take some time to implement it. Let's create a new enhancement issue in the main repo https://github.com/crawlab-team/crawlab