tiny-web-crawler issues

First Major Release v1.0.0

3

This is a place holder Issue for the first major release [v1.0.0](https://github.com/indrajithi/tiny-web-crawler/milestones) **Please feel free to create issue from this list** ## Scope and Features: First major version v1.0.0 ###...

indrajithi

good first issue

placeholder

release

Feature: Support for crawling dynamic javascript heavy site

5

Description: Enhance the existing web crawler to support crawling and extracting content from websites that rely heavily on JavaScript for rendering their content. This feature will involve integrating a headless...

indrajithi

enhancement

Crawl depth per domain

1

- Option to control how many links will be crawled from the same domain

indrajithi

enhancement

good first issue

Feature: Add a feature to only crawl the given list of urls

5

- Accept a argument from the user. Something like `url_list` - Crawl only the urls provided by the users as an argument and nothing else.

indrajithi

enhancement

good first issue

`poetry install --with dev` doesn't install pre-commit hooks

6

Running `poetry install --with dev` doesn't install the pre-commit hooks, as of right now they need to be installed manually through `pre-commit install`

Mews

bug

good first issue

Fix `crawl_result` type hint

1

Because of #19 , the type hint for `Spider.crawl_result` broke, and it was temporarily replaced with `Dict[str, Dict[str, Any]]`. This should be fixed to actually reflect the contents of `crawl_result`,...

Mews

good first issue

house keeping

Docs: Auto generate documentation

- Auto generate documentation using Sphinx or other suitable tools

indrajithi

good first issue

tiny-web-crawler
tiny-web-crawler copied to clipboard

Metadata

First Major Release v1.0.0

Feature: Support for crawling dynamic javascript heavy site

Crawl depth per domain

Feature: Add a feature to only crawl the given list of urls

`poetry install --with dev` doesn't install pre-commit hooks

Fix `crawl_result` type hint

Docs: Auto generate documentation

← Metadata

Owner

Metadata

tiny-web-crawler tiny-web-crawler copied to clipboard

Metadata

First Major Release v1.0.0

Feature: Support for crawling dynamic javascript heavy site

Crawl depth per domain

Feature: Add a feature to only crawl the given list of urls

`poetry install --with dev` doesn't install pre-commit hooks

Fix `crawl_result` type hint

Docs: Auto generate documentation

← Metadata

Owner

Metadata

tiny-web-crawler
tiny-web-crawler copied to clipboard