crawlee icon indicating copy to clipboard operation
crawlee copied to clipboard

Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and o...

Results 336 crawlee issues
Sort by recently updated
recently updated
newest added

**Describe the bug** This is somewhere between a bug and a feature request. When you use RequestList and RequestQueue together, all requests are first taken from the list before any...

bug

**Describe the feature** Allow `errorHandler` to work with the full crawling context. For example, make sure that the `page` is not closed before `errorHandler` finishes. We should also prevent errors...

feature

**Describe the feature** There was a time in beta, handled and pending request in queue were in JSON format. If we wanted to retry some failed requests, we can simply...

feature
t-tooling

This _could_ be a breaking change: pseudourl `https://example.com/]` was valid until now. If that is unacceptable, I can revert that part of the changes (and we could maybe only keep...

* feat rust high performance efficient parallel crawling start This pr starts the integration with a native high performance gRPC crawler that is the fastest and most efficient OSS indexer...

Endpoints for the request queue v2 project are ready with apify-client code. This issue should cover what needs to be done on the SDK site to get the request queue...

feature

**Describe the bug** When a CheerioCrawler request results in a redirect, the set-cookie header from the 302 response is not put into the cookie header of the subsequent request to...

bug

**Describe the feature** In Playwright/Puppeteer crawler, when response is for example 403, crawler automatically throw `Error: Request blocked - received 403 status code.`. Please add option to disable this functionality...

feature

Now you have to write your own function to parse and respect target website's robots.txt file. Common function in an SDK (utils.js probably) for that would be great.

feature
t-tooling

**Describe the feature** Use `handlePageFunction` as the reference when the hooks happen. `postNavigationHooks` sound like it will be executed after `handlePageFunction` and `preNavigationHooks` before `handlePageFunction` but not just after opening...

feature
t-tooling