crawlee-python issues

Add `always_enqueue` option to `Request` for bypassing deduplication

3

- Add an `always_enqueue` option (or use a better name for it, but avoid negative terms) as an input parameter to the `Request.from_url` constructor. - This will allow users to...

vdusek

enhancement

t-tooling

hacktoberfest

Create a new guide for session management

1

- We should create a new documentation guide on how to work with sessions (`SessionPool`). - Inspiration: https://crawlee.dev/docs/guides/session-management

vdusek

documentation

t-tooling

Create a new guide for `PlaywrightCrawler` and `BrowserPool`

1

- We could create a new documentation guide for the `PlaywrightCrawler` and `BrowserPool`. - The guide should include the following: - How to use `PlaywrightCrawler` and what it provides. -...

vdusek

documentation

t-tooling

Create a new guide for scaling the crawlers

5

- We could create a new documentation guide for scaling the crawlers (mainly the features from `_autoscaling` subpackage). - The guide should include the following: - `ConcurrencySettings` - how users...

vdusek

documentation

t-tooling

hacktoberfest

feat: Split and added extra configuration to export_data function

### Description - Split the _export_data_ function into _export_data_csv_ and _export_data_json_, and added additional configuration options using kwargs ### Issues - Closes: #526 ### Testing - Added test to check...

deshansh

docs: homepage changes

2

few changes on homepage for SEO as requested by marketing

souravjain540

t-tooling

Max depth feature

1

Description This PR introduces a maximum crawl depth feature to the Crawlee library. It allows users to restrict the crawler's depth to a specified level, enabling better control over the...

akash47angadi

t-tooling

feat: Added get_public_url method to KeyValueStore

4

### Description This pull request introduces the `get_public_url` method to the `KeyValueStore` class. This method generates a file URL for a given key, allowing for easy access to stored files....

naaa760

t-tooling

fix: do not persist storage when disabled

Closes: #539

vdusek

t-tooling

tested

Expose possible event types from `crawlee._events`

The motivation is to simplify working with event data in custom listeners in `apify-sdk-python` - currently listener parameters cannot be typed without reaching into the private `crawlee._events` submodule. See also...

janbuchar

t-tooling

debt

crawlee-python
crawlee-python copied to clipboard

Metadata

Add `always_enqueue` option to `Request` for bypassing deduplication

Create a new guide for session management

Create a new guide for `PlaywrightCrawler` and `BrowserPool`

Create a new guide for scaling the crawlers

feat: Split and added extra configuration to export_data function

docs: homepage changes

Max depth feature

feat: Added get_public_url method to KeyValueStore

fix: do not persist storage when disabled

Expose possible event types from `crawlee._events`

← Metadata

Owner

Metadata

crawlee-python crawlee-python copied to clipboard

Metadata

← Metadata

Owner

Metadata

crawlee-python
crawlee-python copied to clipboard