spidermon
spidermon copied to clipboard
Enhance slack notifications
Currently the notifications we receive in Slack are quite simple, without much information. For example:
*somesite spider finished with errors!* / view job in Scrapy Cloud _(errors=7)_
• _Job validation/validation errors_
We could add more information there, something like:
*somesite spider finished with errors!* / view job in Scrapy Cloud _(errors=7)_
• _Job validation/validation errors_
* invalid_string: 3
* missing_required_field/field_name: 4
I believe we need to decide if we are keeping features like JSON Schema Validation and Slack Notification on the project's core (#88, for example). Depending on what's decided, we may not invest time improving this code, at least not here in this repository.
@victor-torres I don't see problems creating integration with different systems (like Slack or Amazon SES E-mail, or Amazon Storage), but this couldn't be considered as the core of spidermon .
But, if some user decides to use one of these optional integration, it doesn't hurt if the defaults are more complete, so I believe it would be a good improvement if our default Slack message contains a more complete info about the monitors execution. :-)
I earlier mentioned the same on the mail. @rennerocha is right. We should focus on improving our default platforms such as Slack and Email actions. While the work on other platforms can also be done as I suggested. Improving defaults will help in the long run as users start to take up spidermon in their projects.
I can take up this feature request @ejulio, seems like a great place to start. Do let me know what do you have in mind of what a person sees when he/she receives a slack notification.
One I have in mind right now is when it fails the item validation. It just says "Validation Error/No field error", for example. It would be nice to mention what errors did occur in the message.
Thanks, @ejulio Gotcha, let me set up the test environment. I will let you know of any follow-up questions that I have regarding the issue.
@ejulio I have created my own spider for Slack notifications as specified in the documentation and digging further to refer to the tutorial given here examples/tutorial
. I have also read the extensive Slack documentation as well for making bots to see how the workflow is and also its limitations.
Questions
- I didn't find details in the docs when it comes to configuring Slack notifications and
settings.py
. Should I improve on the same? - I checked out the code for Slack Actions, here
spidermon/contrib/actions/slack
. Am I on the right path? If I do make changes to thenotifiers.py
script, how should I go about testing the changes?
Thanks :hatching_chick:
@rennerocha @ejulio Any comments? I have also mailed you of the approach I am using I have figured out the workflow for testing the changes, and also went through the code that runs to make this work. Any help to find where I could get the stats for the errors that happen on the terminal to be presented the right way in Slack?
@vipulgupta2048 The only documentation to configure Slack notifications is available here: https://spidermon.readthedocs.io/en/latest/actions.html#slack-action Improvements on the documentation are always welcome!
You are right on the place everything is located (spidermon/contrib/actions/slack
). Probably for this issue, we need to update the jinja templates and maybe change the context (get_template_context
method) to include more information.
About testing, we still don't have test for this action, so it would be good to have it.
@vipulgupta2048
Sorry for the late response :disappointed:
Regarding the data to be shown, you'll notice that everything is available in crawler.stats
.
About validation errors, they are added to items https://github.com/scrapinghub/spidermon/blob/master/spidermon/contrib/scrapy/pipelines.py#L144
But, you'll find the same info in crawler.stats
as something like spidermon/validation/fields/errors/missing_required_field/src
I would especially love to see SPIDERMON_ADD_FIELD_COVERAGE
able to be represented in the notifications. I just want to see at a glance if some field isn't getting captured properly or is only grabbing empty strings etc.
I currently actually can't find field coverage logged in scrapinghub anywhere either. The only time I can see field coverage information is when I run the spider locally
Unless I have it configured wrong? It looks like https://github.com/scrapinghub/spidermon/blob/5979f5cd963c3ff8c08fbd9925816f7ba5284568/spidermon/contrib/scrapy/extensions.py#L127 shows it should be added but no field coverage info is actually showing up in my scrapinghub job stats. Maybe since in my spider custom settings I'm doing
'SPIDERMON_SPIDER_CLOSE_MONITORS':
'spidermon.contrib.scrapy.monitors.SpiderCloseMonitorSuite',
'my_crawler1.monitors.SpiderCloseMonitorSuite',
),
could that be overwriting something that's supposed to trigger the actual spider_closed
?
Interesting. Can you try?
class CustomSpidermon(Spidermon):
def spider_closed(self, spider):
super().spider_closed(spider)
spider.logger.info(f"JSON Stats {json.dumps(self.crawler.stats)}") # or any other logger
EXTENSIONS = {
"CustomSpidermon": 0 # instead of spidermon
}
This should log the stats and make sure they've been updated. Maybe something is not running or scrapinghub is not updating them properly