scrapyrt
scrapyrt copied to clipboard
Cannot override log related spider settings
AFAICT it's not possible to override LOG_LEVEL, LOG_FILE, LOG_DIR, etc for spiders because the dict from get_scrapyrt_settings is applied with priority 'cmdline'.
I assume this is due to conflicting goals:
- Have scrapyrt be a "drop in" runner with no config changes required
- Have sane logging in the presence of multiple crawls
My take is
- The dict should have priority 'default' (since they really are defaults - the spider developer might want to customize them)
- scrapyrt should use a scrapyrt.cfg file rather than scrapy.cfg
scrapy.cfg is typically small enough that requiring the user to either copy it or use a template from the documentation wouldn't be a significant burden.
Any solution would be fine, but here are the changes for the above: https://github.com/andrewbaxter/scrapyrt/tree/make_builtin_settings_default https://github.com/andrewbaxter/scrapy/tree/specify-closest-cfg-name
@andrewbaxter the idea behind using scrapy.cfg
instead of any custom config file was to be able to run scrapyrt in any scrapy project difectory without making any changes - just run scrapyrt
in project directory and you're done.
Priority 'cmdline' is used here because default ScrapyRT settings should have highest possible priority and override any project settings - ScrapyRT is relying on that. In opposite 'default' is a lowest possible priority and will be overridden by project settings. Don't think Scrapy's settings priority='default' applies here - overriding Scrapy's default setting can't cause harm, and here it can.
I think it would be easier to be able to override default ScrapyRT spider settings from CrawlManager. This way you will be able to remove or change any setting ScrapyRT is forcing.
@andrewbaxter oh, I think I missed one more option that doesn't require any changes to ScrapyRT - just override CrawlManager and set any settings you want here This method returns Scrapy Settings which you can easily update
Overriding get_scrapyrt_settings
and get_project_settings
is just as dangerous as changing the priority to default
(or command
- I misspoke, that's what we're actually using), right?
Also, CrawlManager overrides seem dependent on implementation details - if there's a chance that the implementation could change and silently break our code (ex: renaming one of those methods) it would be more reliable to create a local fork.
Anyway, we're getting by right now, but I would appreciate some sort of supported channel for making log settings changes.
@andrewbaxter I'm thinking about allowing Scrapy settings with prefix SCRAPY_
in ScrapyRT settings module. So for instance to change default log level one could add following lines to scrapyrt_conf.py:
# ...
SCRAPY_LOG_LEVEL = log.DEBUG
and pass this config file to scrapyrt command
scrapyrt -S scrapyrt_conf
WDYT?
may be related to #62
@pawelmhm, what do you think about @andrewbaxter's idea on changing the priority in get_scrapyrt_settings
to default
?
This could solve many related issues and at first glance I don't see how giving the possibility to override those particular settings can be harmful. I might be wrong, otherwise I can write a PR for it.
Any plan for this? By default I find many log files in logs directory, it seems scrapyrt create one file for one request. It's somewhat unexpected. What's the best practice to config the log?