crawlee-python
crawlee-python copied to clipboard
Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works...
### Description This pull request enhances the `compute_unique_key` function in the `src/crawlee/_utils/requests.py` file to include HTTP headers in the unique key computation and adds corresponding unit tests. The most important...
### Description - Implement get_public_url method in KeyValueStore ### Issues - Closes: #514 ### Testing - Unit tests added ### Checklist - [ ] CI passed
- This adds a `get_key_value_store(id, name)` context helper to `BasicCrawlingContext` - Also, push_data calls are held until the request handler terminates successfully (same as in JS version) - This is...
### Description - Guide for scaling the crawlers ### Issues - Closes: #476 ### Testing - TODO ### Checklist - [ ] CI passed
### Description - Earlier the requests that were generated from `Request.from_url` with the same `unique_key` generated on the same URLs were considered identical requests but this parameter if set to...
### **Description** This PR improves the API documentation for the `BasicCrawler` class by providing clear and concise explanations for all arguments and methods. The documentation now adheres to Google style...
### Description - TODO ### Issues - Closes: #TODO ### Testing - TODO ### Checklist - [ ] CI passed
### Description This PR adds documentation on Crawlee's result storage types, specifically the Key-Value Store and Dataset, providing usage examples and file structures for efficient data management. - Closes: #479...
- We should create a new documentation guide on how to work with result storages (`Dataset`, `KeyValueStore`). - Inspiration: https://crawlee.dev/docs/guides/result-storage - Check the structure of other guides - [docs/guides](https://github.com/apify/crawlee-python/tree/master/docs/guides), and...
### Description - Added 5 files, out of which 2 will aren't currently being used, when crawlee-python will complete puppeteer crawler, those can be used. - Added additional information apart...