scrapix
scrapix copied to clipboard
Prepare scrapix for production use
Context
In the current state of this project, the scraping works and does its job as intended. Nonetheless, some essential parts are missing for this repository to become a tool that we advertise and are comfortable using in production.
First step: MVP
Code maintenance
- [x] Use typescript #15
- [x] Rename some wrongly named variables (headless) (WIP)
- [x] Add a linter (eslint, yamllint) #19
- [x] Add a playground #17
- [ ] Add basic tests #16
- [ ] Fix the trailing slash bug (WIP)
- [ ] Remove unnecessary features (WIP)
- [x] Remove unfinished custom scrapper #27
- [ ] create issue on custom scrapper as possible future enhancement #18
Features
- [ ] User-agent definition "Meilisearch JS (v.X.X.X); Scrapix (vX.X.X)" #20
- [ ] Signal State (Webhook, Websocket, Logs, progress route... whatever) #5
Repository
- [ ] Change the name (think about it)
- [ ] More intuitive explanation on usage in the README.md #21
- [ ] Simple explanation of how to run the project and launch its tests in the contributing guide #22
- [ ] Adding the required files and tools we usually have in our repos:
- [ ] CI (tests, releases,...) #23
- [ ] LICENSE file #24
- [ ] dependabot #25
- [ ] docker + docker compose
Second step
- [ ] Finish README and CONTRIBUTING to our standards
Futur possibilities
- run multiple parsers at the same time
- Use browserless
- custom scraper
- Create strategy for a product focussed scraping
- Push a docker image to run the scrapper
- Create a web page to use the scraper without having to call the routes directly #4
- Scrap code-samples