tiny-web-crawler icon indicating copy to clipboard operation
tiny-web-crawler copied to clipboard

A tiny web crawler in Python

Results 7 tiny-web-crawler issues
Sort by recently updated
recently updated
newest added

This is a place holder Issue for the first major release [v1.0.0](https://github.com/indrajithi/tiny-web-crawler/milestones) **Please feel free to create issue from this list** ## Scope and Features: First major version v1.0.0 ###...

good first issue
placeholder
release

Description: Enhance the existing web crawler to support crawling and extracting content from websites that rely heavily on JavaScript for rendering their content. This feature will involve integrating a headless...

enhancement

- Option to control how many links will be crawled from the same domain

enhancement
good first issue

- Accept a argument from the user. Something like `url_list` - Crawl only the urls provided by the users as an argument and nothing else.

enhancement
good first issue

Running `poetry install --with dev` doesn't install the pre-commit hooks, as of right now they need to be installed manually through `pre-commit install`

bug
good first issue

Because of #19 , the type hint for `Spider.crawl_result` broke, and it was temporarily replaced with `Dict[str, Dict[str, Any]]`. This should be fixed to actually reflect the contents of `crawl_result`,...

good first issue
house keeping

- Auto generate documentation using Sphinx or other suitable tools

good first issue