PHPScraper icon indicating copy to clipboard operation
PHPScraper copied to clipboard

[Request] Add robots.txt parsing

Open joshua-bn opened this issue 2 years ago • 3 comments

Would be nice to have the ability to parse robots.txt like RSS feeds. $web->robots

https://github.com/bopoda/robots-txt-parser is a library. Not sure if it is the one to use here but it seems to do the job

joshua-bn avatar Jan 10 '23 15:01 joshua-bn

Yeah, that's something to consider. I would opt for https://github.com/spatie/robots-txt instead as it's better maintained. What exactly do you want to achieve with the information?

spekulatius avatar Jan 12 '23 07:01 spekulatius

Personally, I am looking for sitemaps declared in robots.txt but I think there's also value in checking for rules for crawling.

joshua-bn avatar Jan 12 '23 14:01 joshua-bn

Fair enough, that's definitely another use-case. I'll see how we can get both working

On Thu, Jan 12, 2023, 15:58 Joshua Dickerson @.***> wrote:

Personally, I am looking for sitemaps declared in robots.txt but I think there's also value in checking for rules for crawling.

— Reply to this email directly, view it on GitHub https://github.com/spekulatius/PHPScraper/issues/177#issuecomment-1380502870, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACAK7M45YFZADMOUK6LOEHLWSALZZANCNFSM6AAAAAATW5RGTE . You are receiving this because you commented.Message ID: @.***>

spekulatius avatar Jan 15 '23 15:01 spekulatius