feat: adding crawlee's EnqueueStrategy config
PR Description
Summary: This pull request introduces changes to the configuration schema and the crawling logic to enhance the flexibility of the crawling strategy. For more information: https://crawlee.dev/api/core/enum/EnqueueStrategy#All Changes Made:
-
Updated Configuration Schema (
config.ts):- Added
crawlStrategyfield to the configuration schema.- This field allows specifying the Crawlee strategy for checking certain parts of the URLs found.
- Possible values are
"all","same-origin","same-hostname", and"same-domain". - This field is optional.
- Added
-
Updated Crawling Logic (
core.ts):- Integrated the
crawlStrategyconfiguration into thePlaywrightCrawlersetup.- The
strategyparameter inenqueueLinksnow uses theconfig.crawlStrategyvalue if provided. - Ensures that the crawling strategy defined in the configuration is applied during the crawling process.
- The
- Integrated the
Impact:
- These changes provide greater control over the crawling behavior, allowing users to specify how URLs are handled based on their origin and domain.
Examples:
- When
crawlStrategyis set to"same-origin", the crawler will only follow links within the same origin. - When
crawlStrategyis set to"all", the crawler will follow all links regardless of their origin.
thanks @muzafferkadir ! looks like build is failing, so will need that in to merge. otherwise, great update
thanks @muzafferkadir ! looks like build is failing, so will need that in to merge. otherwise, great update
thanks, i updated
sorry @muzafferkadir - looks like theres a merge conflict. i can hop on this once green again
sorry @muzafferkadir - looks like theres a merge conflict. i can hop on this once green again
i updated
looks like build not passing @muzafferkadir