gpt-crawler icon indicating copy to clipboard operation
gpt-crawler copied to clipboard

feat: adding crawlee's EnqueueStrategy config

Open muzafferkadir opened this issue 1 year ago • 5 comments

PR Description

Summary: This pull request introduces changes to the configuration schema and the crawling logic to enhance the flexibility of the crawling strategy. For more information: https://crawlee.dev/api/core/enum/EnqueueStrategy#All Changes Made:

  1. Updated Configuration Schema (config.ts):

    • Added crawlStrategy field to the configuration schema.
      • This field allows specifying the Crawlee strategy for checking certain parts of the URLs found.
      • Possible values are "all", "same-origin", "same-hostname", and "same-domain".
      • This field is optional.
  2. Updated Crawling Logic (core.ts):

    • Integrated the crawlStrategy configuration into the PlaywrightCrawler setup.
      • The strategy parameter in enqueueLinks now uses the config.crawlStrategy value if provided.
      • Ensures that the crawling strategy defined in the configuration is applied during the crawling process.

Impact:

  • These changes provide greater control over the crawling behavior, allowing users to specify how URLs are handled based on their origin and domain.

Examples:

  • When crawlStrategy is set to "same-origin", the crawler will only follow links within the same origin.
  • When crawlStrategy is set to "all", the crawler will follow all links regardless of their origin.

muzafferkadir avatar Sep 06 '24 14:09 muzafferkadir

thanks @muzafferkadir ! looks like build is failing, so will need that in to merge. otherwise, great update

steve8708 avatar Sep 10 '24 23:09 steve8708

thanks @muzafferkadir ! looks like build is failing, so will need that in to merge. otherwise, great update

thanks, i updated

muzafferkadir avatar Sep 10 '24 23:09 muzafferkadir

sorry @muzafferkadir - looks like theres a merge conflict. i can hop on this once green again

steve8708 avatar Mar 07 '25 15:03 steve8708

sorry @muzafferkadir - looks like theres a merge conflict. i can hop on this once green again

i updated

muzafferkadir avatar Mar 07 '25 21:03 muzafferkadir

looks like build not passing @muzafferkadir

steve8708 avatar Mar 07 '25 23:03 steve8708