crawlee-python icon indicating copy to clipboard operation
crawlee-python copied to clipboard

Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works...

Results 138 crawlee-python issues
Sort by recently updated
recently updated
newest added

### Description This pull request enhances the `compute_unique_key` function in the `src/crawlee/_utils/requests.py` file to include HTTP headers in the unique key computation and adds corresponding unit tests. The most important...

### Description - Implement get_public_url method in KeyValueStore ### Issues - Closes: #514 ### Testing - Unit tests added ### Checklist - [ ] CI passed

t-tooling

- This adds a `get_key_value_store(id, name)` context helper to `BasicCrawlingContext` - Also, push_data calls are held until the request handler terminates successfully (same as in JS version) - This is...

t-tooling
adhoc
tested

### Description - Guide for scaling the crawlers ### Issues - Closes: #476 ### Testing - TODO ### Checklist - [ ] CI passed

### Description - Earlier the requests that were generated from `Request.from_url` with the same `unique_key` generated on the same URLs were considered identical requests but this parameter if set to...

### **Description** This PR improves the API documentation for the `BasicCrawler` class by providing clear and concise explanations for all arguments and methods. The documentation now adheres to Google style...

t-tooling

### Description - TODO ### Issues - Closes: #TODO ### Testing - TODO ### Checklist - [ ] CI passed

### Description This PR adds documentation on Crawlee's result storage types, specifically the Key-Value Store and Dataset, providing usage examples and file structures for efficient data management. - Closes: #479...

- We should create a new documentation guide on how to work with result storages (`Dataset`, `KeyValueStore`). - Inspiration: https://crawlee.dev/docs/guides/result-storage - Check the structure of other guides - [docs/guides](https://github.com/apify/crawlee-python/tree/master/docs/guides), and...

documentation
t-tooling
hacktoberfest

### Description - Added 5 files, out of which 2 will aren't currently being used, when crawlee-python will complete puppeteer crawler, those can be used. - Added additional information apart...

t-tooling