crawlab-sdk icon indicating copy to clipboard operation
crawlab-sdk copied to clipboard

About crawlab.json

Open ma-pony opened this issue 3 years ago • 6 comments

When I was working with the SDK, I found that the SDK was not very convenient for schedules and deployment of multiple spiders, so I wondered if it could be designed to look like the following

.
| ── packages
│         | ── js_spiders
│         |         | ── js_spider_1
│         |         |         | ── index.js
│         |         | ── js_spider_2
│         |         |         | ── index.js
│         |         | ── package.json
│         |         | ── .....
│         | ──  py_spiders
│         |         | ── py_spider_1
│         |         |         | ── main.py
│         |         | ── py_spider_2
│         |         |         | ── main.py
│         |         | ── setup.py
│         |         | ── .....
│ ── crawlab.json
│ ── makefile

crawlab.json

{
  "spiders": [
    {
      "path": "packages/js_spider",
      "exclude_path": "node_modules",
      "name": "js spiders",
      "description": "js spiders",
      "cmd": "node",
      "schedules": [
        {
          "name": "js spider 1 cron",
          "cron": "* 1 * * *",
          "command": "node js_spider_1/index.js",
          "param": "",
          "mode": "random",
          "description": "js spider 1 cron",
          "enabled": true
        },
        {
          "name": "js spider 2 cron",
          "cron": "* 2 * * *",
          "command": "node js_spider_2/index.js",
          "param": "",
          "mode": "random",
          "description": "js spider 2 cron",
          "enabled": true
        }
      ]
    },
    {
      "path": "packages/py_spider",
      "exclude_path": ".venv",
      "name": "py spiders",
      "description": "py spiders",
      "cmd": "python",
      "schedules": [
        {
          "name": "py spider 1 cron",
          "cron": "* 1 * * *",
          "command": "python py_spider_1/main.py",
          "param": "",
          "mode": "random",
          "description": "py spider 1 cron",
          "enabled": true
        },
        {
          "name": "py spider 2 cron",
          "cron": "* 2 * * *",
          "command": "python py_spider_2/main.py",
          "param": "",
          "mode": "random",
          "description": "py spider 2 cron",
          "enabled": true
        }
      ]
    }
  ]
}

I can help implement this if you think it is possible @tikazyq

ma-pony avatar Oct 31 '22 02:10 ma-pony

Multi-spider support is on the way. Please follow this issue https://github.com/crawlab-team/crawlab/issues/1190

tikazyq avatar Oct 31 '22 03:10 tikazyq

Multi-spider support is on the way. Please follow this issue crawlab-team/crawlab#1190

Will schedules deployments also be included?

ma-pony avatar Oct 31 '22 05:10 ma-pony

Would you elaborate a bit?

tikazyq avatar Oct 31 '22 11:10 tikazyq

Would you elaborate a bit?

In practice, I need to create dozens of new cronjobs along with a new crawler spider, crawler spider upload can be done from the command line, so can cronjobs be done too? then I can write these commands to CICD.

So I would like to add a new param schedules to the crawlab.json to publish and manage cronjobs, like this

    {
      "path": "packages/py_spider",
      "exclude_path": ".venv",
      "name": "py spiders",
      "description": "py spiders",
      "cmd": "python",
      "schedules": [
        {
          "name": "py spider 1 cron",
          "cron": "* 1 * * *",
          "command": "python py_spider_1/main.py",
          "param": "",
          "mode": "random",
          "description": "py spider 1 cron",
          "enabled": true
        },
        {
          "name": "py spider 2 cron",
          "cron": "* 2 * * *",
          "command": "python py_spider_2/main.py",
          "param": "",
          "mode": "random",
          "description": "py spider 2 cron",
          "enabled": true
        },
       ...
      ]
    }

what do you think of these ideas, or do you have any other better suggestions?

ma-pony avatar Nov 01 '22 06:11 ma-pony

Would you elaborate a bit?

@tikazyq What do you think about the above

ma-pony avatar Nov 08 '22 05:11 ma-pony

I think that's a good idea but it might take some time to implement it. Let's create a new enhancement issue in the main repo https://github.com/crawlab-team/crawlab

tikazyq avatar Nov 09 '22 07:11 tikazyq