crawlee-python icon indicating copy to clipboard operation
crawlee-python copied to clipboard

Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works...

Results 138 crawlee-python issues
Sort by recently updated
recently updated
newest added

- Add an `always_enqueue` option (or use a better name for it, but avoid negative terms) as an input parameter to the `Request.from_url` constructor. - This will allow users to...

enhancement
t-tooling
hacktoberfest

- We should create a new documentation guide on how to work with sessions (`SessionPool`). - Inspiration: https://crawlee.dev/docs/guides/session-management

documentation
t-tooling

- We could create a new documentation guide for the `PlaywrightCrawler` and `BrowserPool`. - The guide should include the following: - How to use `PlaywrightCrawler` and what it provides. -...

documentation
t-tooling

- We could create a new documentation guide for scaling the crawlers (mainly the features from `_autoscaling` subpackage). - The guide should include the following: - `ConcurrencySettings` - how users...

documentation
t-tooling
hacktoberfest

### Description - Split the _export_data_ function into _export_data_csv_ and _export_data_json_, and added additional configuration options using kwargs ### Issues - Closes: #526 ### Testing - Added test to check...

few changes on homepage for SEO as requested by marketing

t-tooling

Description This PR introduces a maximum crawl depth feature to the Crawlee library. It allows users to restrict the crawler's depth to a specified level, enabling better control over the...

t-tooling

### Description This pull request introduces the `get_public_url` method to the `KeyValueStore` class. This method generates a file URL for a given key, allowing for easy access to stored files....

t-tooling

The motivation is to simplify working with event data in custom listeners in `apify-sdk-python` - currently listener parameters cannot be typed without reaching into the private `crawlee._events` submodule. See also...

t-tooling
debt