robotstxt icon indicating copy to clipboard operation
robotstxt copied to clipboard

Parse rules for a given user agent

Open yields opened this issue 6 years ago • 1 comments

I'm wondering if it's worth optimizing memory-usage a bit by parsing just a single ruleset for a given user agent, so the signature might be one of:

rules, err := robotstxt.ForAgent(buf, "mybot")
rules, err := robotstxt.ParseAgent(buf, "mybot")

The parser would skip all non-matching user-agents (except for *), if a ruleset for mybot was found, it would return its ruleset, otherwise it would return the default ruleset *.

The method could accept multiple useragents, so for a example a search engine crawler might do:

rules, err := robotstxt.Parse(buf, "Searchbot", "Googlebot")
// rules is searchbot
// fallback to Googlebot
// fallback to *

LMK if you would consider a PR that implements this feature.

yields avatar Sep 20 '19 18:09 yields

Yes, this idea makes sense. Please don't break existing API.

I think it wouldn't save considerable time or memory though. Do you have benchmarks?

On Fri, Sep 20, 2019, 21:01 Amir Abushareb [email protected] wrote:

I'm wondering if it's worth optimizing memory-usage a bit by parsing just a single ruleset for a given user agent, so the signature might be one of:

rules, err := robotstxt.ForAgent(buf, "mybot")rules, err := robotstxt.ParseAgent(buf, "mybot")

The parser would skip all non-matching user-agents (except for *), if a ruleset for mybot was found, it would return its ruleset, otherwise it would return the default ruleset *.

The method could accept multiple useragents, so for a example a search engine crawler might do:

rules, err := robotstxt.Parse(buf, "Searchbot", "Googlebot")// rules is searchbot// fallback to Googlebot// fallback to *

LMK if you would consider a PR that implements this feature.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/temoto/robotstxt/issues/24?email_source=notifications&email_token=AAAGTMJZXJS2M3Q3NZU3Q5LQKUFXNA5CNFSM4IYZZNL2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HMXVGGA, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAGTMIYLS5ES7GJZJBXFODQKUFXNANCNFSM4IYZZNLQ .

temoto avatar Sep 21 '19 07:09 temoto