crawlee-python icon indicating copy to clipboard operation
crawlee-python copied to clipboard

Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works...

Results 138 crawlee-python issues
Sort by recently updated
recently updated
newest added

- the `ProactorEventLoop` used by asyncio on Windows does not implement `add_signal_handler` - on UNIX, we use it to catch sigint early, print a message and cancel the task that...

t-tooling

Coordinate with @barjin before implementing anything. There is a possibility of developing a dedicated fingerprinting library (in Rust?). In that case, we will do just some wrapping in Python tooling...

enhancement
t-tooling

### Context A while ago, Honza Javorek raised some good points regarding the deduplication process in the request queue ([#190](https://github.com/apify/apify-sdk-python/issues/190)). The first one: > Is it possible that Apify's request...

enhancement
t-tooling

### Description - Enhance the testing of PlaywrightCrawler by adding a mocked Playwright API. - It will provide more isolated & stable testing environment, similar to how we use HTTPX...

t-tooling
debt

Generate CHANGELOG from the commit messages as we do in JS/TS projects. Once this is solved for this repository, please create the same issue in the SDK, Client, and Shared...

enhancement
t-tooling

- https://crawlee.dev/api/core/function/useState

enhancement
t-tooling

- Enhance testing for `wait_for_all_requests_to_be_added=False` scenario in `Request.Queue.add_requests_batched` - Based on the https://github.com/apify/crawlee-python/pull/186#discussion_r1642398284.

t-tooling
debt

(it's only used in tests)

t-tooling
debt

- Naming `browsers/browser_plugin.py` vs `browsers/browser_factory.py` (or `BrowserControllerFactory`). - "Plugin" is the old name and doesn't quite fit the current use case. "Factory," on the other hand, seems to be a...

t-tooling
debt

The purpose of the fields is somewhat unclear, but it's certain that they don't belong to the `Request` class. We should definitely explore the notion of an internal request in...

t-tooling
debt