Mikhail Korobov

Results 479 comments of Mikhail Korobov

One more thing: it would be cool to have await-based shortcuts for headers_received and bytes_received signals. Use cases: a different way to cancel a request; streaming of the responses.

`CONCURRENT_ITEMS = 1` doesn't limit the number of concurrent uploads because (again, based on reading the source code) `S3FilesStore.persist_file` defers uploading to thread and doesn't wait for the result. `CONCURRENT_ITEMS...

IMHO a correct solution would be to implement https://mimesniff.spec.whatwg.org/, though it looks like a significant amount of work (something like a GSoC project?). Ideally, as an external library which Scrapy...

I think that'd be a great GSoC project. mimesniff library doesn't look enough indeed.

I'm not sure we should be using libmagic here, as the goal is to be compatible with browser MIME sniffing, the way browsers implement it, not to implement a general...

@pablohoffman packages like * https://github.com/scrapy-plugins/scrapy-playwright/, * https://github.com/scrapinghub/scrapy-autoextract, * https://github.com/scrapy-plugins/scrapy-zyte-api require asyncio reactor; any Scrapy project which uses one of these extensions needs to switch to ascynio reactor.

For the reference, django's implementation: https://github.com/django/django/tree/master/django/dispatch One advantage of switching to django implementation is that is supports Python 3.x. But I haven't checked if it is possible to switch.

another option: https://pypi.python.org/pypi/blinker

I've profiled a simple spider which downloads a page and follows all links from it using LinkExractor. There is a non-standard CrawlerProcess which listens to all signals from its Crawlers...

JSON output is easy to implement, but there is a couple of caveats: 1. If we implement JSON dump it should be implemented consistently - both for periodic stat dumps...