crawlee-python icon indicating copy to clipboard operation
crawlee-python copied to clipboard

BasicCrawler status logging

Open janbuchar opened this issue 1 year ago • 4 comments

  • configurable interval
  • configurable status message callback (constructor parameter, property or decorator?)
  • we periodically set the crawler status via storage client
  • in javascript crawlee, this does nothing when MemoryStorage is being used

janbuchar avatar Apr 09 '24 10:04 janbuchar

in javascript crawlee, this does nothing when MemoryStorage is being used

What's the reason behind this? I believe this could be useful for local execution as well.

vdusek avatar Mar 06 '25 09:03 vdusek

We also log the message on the basic crawler level (to the debug level, since it's executed quite often, every 10 seconds iirc). What is ignored is the API call which this method represents, since there is no alternative to that when running locally.

https://github.com/apify/crawlee/blob/ad6fcd4a6c7ddd59afb48e763c0523992d263b54/packages/basic-crawler/src/internals/basic-crawler.ts#L797-L799

B4nan avatar Mar 06 '25 09:03 B4nan

I remember me and @tobice as well being confused by the "set status message" method being a part of the StorageClient - it has nothing to do with storage, except that it invokes an Apify API call - I suppose that is why it got slapped on top of that.

Could the crawler emit a "crawler status" event via EventManager instead? Then the SDK could hook into that and set the status message using its apify client instance.

janbuchar avatar Mar 06 '25 10:03 janbuchar

Yeah that sounds good to me.

B4nan avatar Mar 06 '25 10:03 B4nan