scrapix Prepare scrapix for production use

Prepare scrapix for production use

Open bidoubiwa opened this issue 1 year ago • 0 comments

Context

In the current state of this project, the scraping works and does its job as intended. Nonetheless, some essential parts are missing for this repository to become a tool that we advertise and are comfortable using in production.

First step: MVP

Code maintenance

[x] Use typescript #15
[x] Rename some wrongly named variables (headless) (WIP)
[x] Add a linter (eslint, yamllint) #19
[x] Add a playground #17
[ ] Add basic tests #16
[ ] Fix the trailing slash bug (WIP)
[ ] Remove unnecessary features (WIP)
[x] Remove unfinished custom scrapper #27
[ ] create issue on custom scrapper as possible future enhancement #18

Features

[ ] User-agent definition "Meilisearch JS (v.X.X.X); Scrapix (vX.X.X)" #20
[ ] Signal State (Webhook, Websocket, Logs, progress route... whatever) #5

Repository

[ ] Change the name (think about it)
[ ] More intuitive explanation on usage in the README.md #21
[ ] Simple explanation of how to run the project and launch its tests in the contributing guide #22
[ ] Adding the required files and tools we usually have in our repos:
- [ ] CI (tests, releases,...) #23
- [ ] LICENSE file #24
- [ ] dependabot #25
[ ] docker + docker compose

Second step

[ ] Finish README and CONTRIBUTING to our standards

Futur possibilities

run multiple parsers at the same time
Use browserless
custom scraper
Create strategy for a product focussed scraping
Push a docker image to run the scrapper
Create a web page to use the scraper without having to call the routes directly #4
Scrap code-samples

Jun 19 '23 14:06 bidoubiwa

scrapix scrapix copied to clipboard

Prepare scrapix for production use

Context

First step: MVP

Code maintenance

Features

Repository

Second step

Futur possibilities

scrapix
scrapix copied to clipboard