crawlee-python icon indicating copy to clipboard operation
crawlee-python copied to clipboard

Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works...

Results 138 crawlee-python issues
Sort by recently updated
recently updated
newest added

These items are currently blocked but should be resolved before the public launch (8. 7.). ### TODO - [x] Replace all occurrences of `apify.github.io/crawlee-python` with `crawlee.dev/python` in `README.md` once the...

documentation
t-tooling

- Only markdown content. - Inspiration: https://crawlee.dev/docs/guides. - Some content from old readme could be copied in - https://github.com/apify/crawlee-python/blob/v0.0.7/README.md.

documentation
t-tooling

similar to what we're implementing in JS crawlee

enhancement
t-tooling

- configurable interval - configurable status message callback (constructor parameter, property or decorator?) - we periodically set the crawler status via storage client - in javascript crawlee, this does nothing...

t-tooling

This change makes https://github.com/apify/apify-sdk-python/blob/162ce1080d024fe2cf399534e8f960a584524232/tests/unit/actor/test_actor_memory_storage_e2e.py#L54 pass again. The PR is a draft, it exists mostly so that I don't lose or forget this.

t-tooling
adhoc

- Currently, there is only a dummy version of `Snapshotter._snapshot_client()` without a real measurement. - Once `StorageClient` is implemented, use it there to measure the real values. - Check TypeScript...

enhancement
t-tooling

- Fetching requests from `RequestQueue` is sometimes very slow and can get stuck for a while. - I turned on logging and reproduced the issue with the following code: ```python...

bug
t-tooling

https://github.com/apify/crawlee-python/blob/896501edb44f801409fec95cb3e5f2bcfcb4188d/src/crawlee/beautifulsoup_crawler/beautifulsoup_crawler.py#L86 can be used as reference

enhancement
t-tooling