texting_robots icon indicating copy to clipboard operation
texting_robots copied to clipboard

Adds function that exposes whether our bot is explicitely referenced

Open T-Sujeeban opened this issue 5 months ago • 3 comments

T-Sujeeban avatar Jan 16 '24 13:01 T-Sujeeban

This code doesn't appear to implement the actual function of changing to True / False depending on whether a bot is referenced?

Starting at the Issue level may be helpful to understand the needs and purposes of such a function as well.

I'm not against this as a concept if there's a reason - but if there's a reason there may be other useful metadata to expose as well! :)

Smerity avatar Jan 25 '24 07:01 Smerity

The information of whether the User-agent is referenced or not is already computed in the function Robot::new in the variable references_our_bot.

T-Sujeeban avatar Jan 26 '24 09:01 T-Sujeeban

Hi there! @T-Sujeeban and I work together at @qwant, so I thought I'd drop by and shed some light on our use case :)

We are developing a crawler for a search engine and we need to know, for a given website, whether we are allowed to crawl its pages. For this use case, this library has worked very well for us.

On top of that, we would like to know, in the case where our crawler is rejected by the website, if the websites rejects all bots from crawling it, or if it rejects Qwant specifically. If it's the former, well there's nothing much we can do about it, but if it's the latter, e.g. a website that rejects our search engine but accepts Google, we could identify such websites and try to work it out with the owners.

Now, back to the PR: it would appear that this information is already computed by your library, on line 397 of lib.rs, but not exposed by the crate's public API. This PR stores this information in the Robot struct and exposes it to the outside. our_bot in this context means the agent that was provided when building the Robot using Robot::new.

gbogard avatar Feb 14 '24 09:02 gbogard