fastcrawler
fastcrawler copied to clipboard
Modern, fast (high-performance) asynchronous scraping framework based on standard Python type hints and Pydantic.
- [x] #3 - [x] #7 - [ ] #9 - [x] #10 - [ ] #11 - [ ] #12 - [ ] #13 - [x] #14 - [x]...
Improve dependency manager, one of team member have plan for this, please describe in your PR how did you manage to do it.
- [ ] Define new abstraction, so the process would be unaware of what component is Spider - [ ] In result, we should be able to add Parser, Saver...
I have added Persian and English mkdocs to the project, but they need to be cleaner and better from now on.
- Fix performance issue with fastcrawler - The Parser is very slow, comparing to scrapy
Decouple batching and performing the request, by using a new class called Batcher ```python def get_batches(self)-> iterable[Batch]: return list(Batch(engine=self.engine, setting=self.setting, ...)) for batch in self.get_barches(): batch_results = await Batch.perform() parsed_result...