tiny-web-crawler
tiny-web-crawler copied to clipboard
A tiny web crawler in Python
This is a place holder Issue for the first major release [v1.0.0](https://github.com/indrajithi/tiny-web-crawler/milestones) **Please feel free to create issue from this list** ## Scope and Features: First major version v1.0.0 ###...
Description: Enhance the existing web crawler to support crawling and extracting content from websites that rely heavily on JavaScript for rendering their content. This feature will involve integrating a headless...
- Option to control how many links will be crawled from the same domain
- Accept a argument from the user. Something like `url_list` - Crawl only the urls provided by the users as an argument and nothing else.
Running `poetry install --with dev` doesn't install the pre-commit hooks, as of right now they need to be installed manually through `pre-commit install`
Because of #19 , the type hint for `Spider.crawl_result` broke, and it was temporarily replaced with `Dict[str, Dict[str, Any]]`. This should be fixed to actually reflect the contents of `crawl_result`,...
- Auto generate documentation using Sphinx or other suitable tools