robots-txt-parser
robots-txt-parser copied to clipboard
Parsing rules is platform dependent
Attempting to parse rules that use \n as a line separator on a Windows machine fails, as the used PHP_EOL constant is \r\n on Windows, resulting in a single user-agent directive containing the full sitemap content.
This is caused by RobotsTxtParser->prepareRules() in line 148.
/**
* Parse rules
*
* @return void
*/
private function prepareRules()
{
$rows = explode(PHP_EOL, $this->content); // issue
foreach ($rows as $row) {
// ...
Some form of line separator detection and normalisation should be used instead.
As a workaround, using the following RegEx before handing the robots.txt content to RobotsTxtParser will work:
$content = preg_replace("/\R/u", PHP_EOL, $content);
$parser = new RobotsTxtParser($content);
This RegEx will replace all Unicode newlines with the system newline, which the parser currently uses as well. This same solution could also be applied inside the parser to fix this issue.