crawlee-python issues

Improve `KeyboardInterrupt` handling on Windows

- the `ProactorEventLoop` used by asyncio on Windows does not implement `add_signal_handler` - on UNIX, we use it to catch sigint early, print a message and cancel the task that...

janbuchar

t-tooling

Implement fingerprinting

3

Coordinate with @barjin before implementing anything. There is a possibility of developing a dedicated fingerprinting library (in Rust?). In that case, we will do just some wrapping in Python tooling...

vdusek

enhancement

t-tooling

Improve the deduplication of requests

1

### Context A while ago, Honza Javorek raised some good points regarding the deduplication process in the request queue ([#190](https://github.com/apify/apify-sdk-python/issues/190)). The first one: > Is it possible that Apify's request...

vdusek

enhancement

t-tooling

Enhance `PlaywrightCrawler` testing with mocked Playwright API

### Description - Enhance the testing of PlaywrightCrawler by adding a mocked Playwright API. - It will provide more isolated & stable testing environment, similar to how we use HTTPX...

vdusek

t-tooling

debt

Generate changelog from the commit messages

Generate CHANGELOG from the commit messages as we do in JS/TS projects. Once this is solved for this repository, please create the same issue in the SDK, Client, and Shared...

vdusek

enhancement

t-tooling

Implement `use_state` context helper method

- https://crawlee.dev/api/core/function/useState

vdusek

enhancement

t-tooling

Enhance testing for `wait_for_all_requests_to_be_added=False` scenario in `add_requests_batched`

- Enhance testing for `wait_for_all_requests_to_be_added=False` scenario in `Request.Queue.add_requests_batched` - Based on the https://github.com/apify/crawlee-python/pull/186#discussion_r1642398284.

vdusek

t-tooling

debt

Run httpbin locally instead of using httpbin.org

1

(it's only used in tests)

janbuchar

t-tooling

debt

Naming `BrowserPlugin` vs `BrowserFactory`

- Naming `browsers/browser_plugin.py` vs `browsers/browser_factory.py` (or `BrowserControllerFactory`). - "Plugin" is the old name and doesn't quite fit the current use case. "Factory," on the other hand, seems to be a...

vdusek

t-tooling

debt

Remove `json_` and `order_no` from `Request`

The purpose of the fields is somewhat unclear, but it's certain that they don't belong to the `Request` class. We should definitely explore the notion of an internal request in...

janbuchar

t-tooling

debt

crawlee-python
crawlee-python copied to clipboard

Metadata

Improve `KeyboardInterrupt` handling on Windows

Implement fingerprinting

Improve the deduplication of requests

Enhance `PlaywrightCrawler` testing with mocked Playwright API

Generate changelog from the commit messages

Implement `use_state` context helper method

Enhance testing for `wait_for_all_requests_to_be_added=False` scenario in `add_requests_batched`

Run httpbin locally instead of using httpbin.org

Naming `BrowserPlugin` vs `BrowserFactory`

Remove `json_` and `order_no` from `Request`

← Metadata

Owner

Metadata

crawlee-python crawlee-python copied to clipboard

Metadata

← Metadata

Owner

Metadata

crawlee-python
crawlee-python copied to clipboard