php-readability icon indicating copy to clipboard operation
php-readability copied to clipboard

Undefined array key 0 after array_filter

Open mihaistana opened this issue 1 year ago • 4 comments

After the array_filter on line 1481 on function hasSingleTagInsideElement the array sometimes not start with 0,

ErrorException

Undefined array key 0

at vendor/j0k3r/php-readability/src/Readability.php:1485 1481▕ $children = array_filter($childNodes, fn ($childNode) => $childNode instanceof \DOMElement); 1482▕ //$children = array_values($children); 1483▕ // There should be exactly 1 element child with given tag 1484▕
➜ 1485▕ if (1 !== \count($children) || $children[0]->nodeName !== $tag) { 1486▕ return false; 1487▕ } 1488▕ 1489▕ $a = array_filter(

to fix it you have to add array_values to reset the array index.

private function hasSingleTagInsideElement(\DOMElement $node, string $tag): bool { $childNodes = iterator_to_array($node->childNodes); $children = array_filter($childNodes, fn ($childNode) => $childNode instanceof \DOMElement); $children = array_values($children); // There should be exactly 1 element child with given tag if (1 !== \count($children) || $children[0]->nodeName !== $tag) { return false; }

mihaistana avatar Nov 27 '24 08:11 mihaistana

You might be right. If you can reproduce the bug with a given website & create a test, I'm happy to review the fix :)

j0k3r avatar Nov 27 '24 08:11 j0k3r

Hi @j0k3r yes you can try with my website "https://agenciaweb.net" it's where I tried it and failed.

mihaistana avatar Nov 27 '24 09:11 mihaistana

For example with all pages from golem.de

$url = 'https://www.golem.de/news/anzeige-varta-aa-batterien-grosses-set-zum-kleinen-preis-bei-amazon-2501-192329.html';

$graby = new Graby(); $result = $graby->fetchContent($url);

ofeige avatar Jan 10 '25 22:01 ofeige

Looking at the code, the undefined key can only happen if there is single p element preceded by non-elements. Out of the node types:

The only way this could crash I can come up with was to disable tidy and use a comment:

// This would fail on “Undefined array key 0” without tidy.
public function testDivSingleP(): void {
    $readability = $this->getReadability('<div><!-- foo --><p>' . str_repeat('This is the awesome content. ', 7) . '</p></div>', 'http://0.0.0.0');
    $res = $readability->init();

    $this->assertTrue($res);
    $this->assertInstanceOf(JSLikeHTMLElement::class, $readability->getContent());
    $this->assertInstanceOf(JSLikeHTMLElement::class, $readability->getTitle());
    $this->assertStringContainsString('<div readability=', $readability->getContent()->getInnerHtml());
    $this->assertEmpty($readability->getTitle()->getInnerHtml());
    $this->assertStringContainsString('This is the awesome content.', $readability->getContent()->getInnerHtml());
}

But then it just crashes elsewhere so I am not sure #97 could fix it:

TypeError: Readability\Readability::getAncestors(): Argument #1 ($node) must be of type DOMElement, DOMComment given, called in /home/jtojnar/Projects/php-readability/src/Readability.php on line 1022

/home/jtojnar/Projects/php-readability/src/Readability.php:1444
/home/jtojnar/Projects/php-readability/src/Readability.php:1022
/home/jtojnar/Projects/php-readability/src/Readability.php:244
/home/jtojnar/Projects/php-readability/tests/ReadabilityTest.php:119

jtojnar avatar Jun 04 '25 07:06 jtojnar