tiny-web-crawler First Major Release v1.0.0

This is a place holder Issue for the first major release v1.0.0

Please feel free to create issue from this list

Scope and Features: First major version v1.0.0

Functional Requirements

[x] Basic Crawling Functionality #1
[x] Configurable options for maximum links to crawl #1
[x] Handle both relative and absolute URLs #1
[x] Save crawl results to a specified file #1
[x] Configurable verbosity levels for logging #7
[x] Concurrency and custom delay #7
[x] Support Regular expression #16
[x] Crawl internal / external links only #11
[x] Return optional html in response #19
[ ] Crawl depth per website/domain #37
[x] Logging #38
[x] Retry mechanism for transient errors #39
[ ] Support Javascript heavy dynamic websites #10
[x] (Optional) Respect Robots.txt #42
[ ] (Optional) User-Agent Customization
[ ] (Optional) Proxy support
[ ] (Optional) Use Asynchronous I/O
[ ] (Optional) Crawl output to database (Mongo mabye)

Non Functional Requirements

[X] Git workflow for CI/CD #4
[ ] Documentation (API and Developer) #18
[x] Test coverage above 80% #28
[x] Git hooks #22
[x] Modular and Extensible Architecture #17
[ ] (Optional) Memory Benchmark: Monitor Monitor memory usage during the crawling process
[ ] (optional) Security considerations (e.g., handling of malicious content)

Jun 16 '24 17:06 indrajithi

You forgot to check "Return optional html in response https://github.com/indrajithi/tiny-web-crawler/pull/19" ;)

Jun 16 '24 18:06 Mews

@indrajithi On Git hooks maybe you should link my second pr on that feature (#25 ) so people also see the pre-commit install --hook-type pre-push command :)

Jun 17 '24 11:06 Mews

@indrajithi you can check "Test coverage above 80%" now ;)

Jun 17 '24 15:06 Mews

tiny-web-crawler tiny-web-crawler copied to clipboard

First Major Release v1.0.0

Scope and Features: First major version v1.0.0

Functional Requirements

Non Functional Requirements

tiny-web-crawler
tiny-web-crawler copied to clipboard