crawlee issues

Add a simpler option for `enqueueLinks` to handle subdomain filtering

### Which package is the feature request for? If unsure which one to select, leave blank @crawlee/core ### Feature Currently, there are only very limited methods for a user to...

axmanalad

feature

t-tooling

Improve type safety

1

- There are multiple opportunities throughout Crawlee to improve type safety and optionally introduce runtime validation: - Dataset items (`CrawlingContext.pushData`) - Key-value store content - `Request.userData` - Request routing labels...

janbuchar

t-tooling

Better resource & state management with `UserPool`

Resource management is currently done in multiple places (`BrowserPool`, `SessionPool`, `ProxyConfiguration`...), which leads to complexity and potential resource conflicts. Typical issue: ```typescript const crawler = new PlaywrightCrawler({ proxyConfiguration: new ProxyConfiguration({...

barjin

t-tooling

Extract data relevant to a single `BasicCrawler.run` into a separate class

- there is a lot of data related to a single invocation of the `run()` method in the class - `stats` - `autoscaledPool` - `running` - `crawlingContexts` (might make sense...

janbuchar

t-tooling

Remove `BasicCrawler.handledRequestsCount`

- The property essentially duplicates what is already present in `RequestList` and `RequestQueue` - this brings no benefit and leads to confusion

janbuchar

t-tooling

Simplify the interface of the `enqueueLinks` helper

- expose an `extractLinks` helper for additional flexibility - get rid of the `requestQueue` argument - get rid of `pseudoUrls` - depends on #2479

janbuchar

t-tooling

crawlee
crawlee copied to clipboard

Metadata

Add a simpler option for `enqueueLinks` to handle subdomain filtering

Improve type safety

Better resource & state management with `UserPool`

Extract data relevant to a single `BasicCrawler.run` into a separate class

Remove `BasicCrawler.handledRequestsCount`

Simplify the interface of the `enqueueLinks` helper

← Metadata

Owner

Metadata

crawlee crawlee copied to clipboard

Metadata

Add a simpler option for `enqueueLinks` to handle subdomain filtering

Improve type safety

Better resource & state management with `UserPool`

Extract data relevant to a single `BasicCrawler.run` into a separate class

Remove `BasicCrawler.handledRequestsCount`

Simplify the interface of the `enqueueLinks` helper

← Metadata

Owner

Metadata

crawlee
crawlee copied to clipboard