scrapix icon indicating copy to clipboard operation
scrapix copied to clipboard

Prepare scrapix for production use

Open bidoubiwa opened this issue 1 year ago • 0 comments

Context

In the current state of this project, the scraping works and does its job as intended. Nonetheless, some essential parts are missing for this repository to become a tool that we advertise and are comfortable using in production.

First step: MVP

Code maintenance

  • [x] Use typescript #15
  • [x] Rename some wrongly named variables (headless) (WIP)
  • [x] Add a linter (eslint, yamllint) #19
  • [x] Add a playground #17
  • [ ] Add basic tests #16
  • [ ] Fix the trailing slash bug (WIP)
  • [ ] Remove unnecessary features (WIP)
  • [x] Remove unfinished custom scrapper #27
  • [ ] create issue on custom scrapper as possible future enhancement #18

Features

  • [ ] User-agent definition "Meilisearch JS (v.X.X.X); Scrapix (vX.X.X)" #20
  • [ ] Signal State (Webhook, Websocket, Logs, progress route... whatever) #5

Repository

  • [ ] Change the name (think about it)
  • [ ] More intuitive explanation on usage in the README.md #21
  • [ ] Simple explanation of how to run the project and launch its tests in the contributing guide #22
  • [ ] Adding the required files and tools we usually have in our repos:
    • [ ] CI (tests, releases,...) #23
    • [ ] LICENSE file #24
    • [ ] dependabot #25
  • [ ] docker + docker compose

Second step

  • [ ] Finish README and CONTRIBUTING to our standards

Futur possibilities

  • run multiple parsers at the same time
  • Use browserless
  • custom scraper
  • Create strategy for a product focussed scraping
  • Push a docker image to run the scrapper
  • Create a web page to use the scraper without having to call the routes directly #4
  • Scrap code-samples

bidoubiwa avatar Jun 19 '23 14:06 bidoubiwa