crawlee-python icon indicating copy to clipboard operation
crawlee-python copied to clipboard

Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works...

Results 138 crawlee-python issues
Sort by recently updated
recently updated
newest added

### Description Currently, we have a `MemoryStorageClient`, that can persist the data in the file system. Let's separate them, `FilesystemStorageClient` could probably extend `MemoryStorageClient` ### Other related things - There...

enhancement
t-tooling

The current implementation is very basic and mostly serves for testing. We should make it more like https://github.com/apify/crawlee/blob/master/packages/core/src/storages/request_list.ts

t-tooling

Thanks to the Pydantic issue https://github.com/pydantic/pydantic-settings/issues/180 we cannot use the key-word argument `local_storage_dir` but `crawlee_local_storage_dir`. We also need to use type ignores there. Let's rename all the key-word arguments from...

t-tooling

The current [Crawlee / StorageClientManager](https://github.com/apify/crawlee-py/blob/master/src/crawlee/storage_client_manager.py) is more or less just copied from the [Python SDK / StorageClientManager](https://github.com/apify/apify-sdk-python/blob/master/src/apify/storages/storage_client_manager.py) and is extremely simple. Its primary role is to maintain and provide access...

enhancement
t-tooling

Simplify code in `RequestQueue._ensure_head_is_non_empty` https://github.com/apify/apify-sdk-python/blob/v1.3.0/src/apify/storages/request_queue.py#L428

t-tooling
debt

In the current state, we make a new logger in every module that needs to log something. There is `CrawleeLogFormatter`, which handles logging in the console. - our loggers should...

t-tooling

See https://github.com/apify/crawlee/blob/2d5d443da5fa701b21aec003d4d84797882bc175/packages/basic-crawler/src/internals/basic-crawler.ts#L836-L845 for inspiration

t-tooling

A part of the functionality has been added in #142. - grouping and summarizing errors is mostly missing - there doesn't seem to be a good reason for this to...

t-tooling