texting_robots
texting_robots copied to clipboard
Adds function that exposes whether our bot is explicitely referenced
This code doesn't appear to implement the actual function of changing to True / False depending on whether a bot is referenced?
Starting at the Issue level may be helpful to understand the needs and purposes of such a function as well.
I'm not against this as a concept if there's a reason - but if there's a reason there may be other useful metadata to expose as well! :)
The information of whether the User-agent is referenced or not is already computed in the function Robot::new
in the variable references_our_bot
.
Hi there! @T-Sujeeban and I work together at @qwant, so I thought I'd drop by and shed some light on our use case :)
We are developing a crawler for a search engine and we need to know, for a given website, whether we are allowed to crawl its pages. For this use case, this library has worked very well for us.
On top of that, we would like to know, in the case where our crawler is rejected by the website, if the websites rejects all bots from crawling it, or if it rejects Qwant specifically. If it's the former, well there's nothing much we can do about it, but if it's the latter, e.g. a website that rejects our search engine but accepts Google, we could identify such websites and try to work it out with the owners.
Now, back to the PR: it would appear that this information is already computed by your library, on line 397 of lib.rs
, but not exposed by the crate's public API. This PR stores this information in the Robot
struct and exposes it to the outside. our_bot
in this context means the agent that was provided when building the Robot
using Robot::new
.