php-readability
php-readability copied to clipboard
wrong document type error
I'm trying to run against the link https://www.privatdozent.co/p/the-battle-line-at-louvain-1914, and getting an error.
15:13:45 INFO [graby] Opengraph "article:" data: [] ["ogData" => []]
15:13:45 INFO [graby] JSON-LD data: ["@context" => "https://schema.org","@type" => "NewsArticle","url" => "https://www.privatdozent.co/p/the-battle-line-at-louvain-1914","mainEntityOfPage" => "https://www.privatdozent.co/p/the-battle-line-at-louvain-1914","headline" => "The Battle Line at Louvain (1914)","description" => "“Where they burn books, they will also burn people” — Heinrich Heine","image" => [["@type" => "ImageObject","url" => "https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca982839-4161-4d7b-90a3-9ff1bdeca5f0_1280x939.jpeg"]],"datePublished" => "2024-11-15T09:42:48+00:00","dateModified" => "2024-11-15T09:42:48+00:00","isAccessibleForFree" => true,"author" => [["@type" => "Person","name" => "Jørgen Veisdal","url" => "https://substack.com/@privatdozent","description" => "Author of Privatdozent. Associate Professor.","identifier" => "user:3088938","sameAs" => ["https://twitter.com/JorgenVeisdal"],"image" => ["@type" => "ImageObject","contentUrl" => "https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F86ca7756-940a-4d3c-affc-fd5e6a968f2a_2653x2653.jpeg","thumbnailUrl" => "https://substackcdn.com/image/fetch/w_128,h_128,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F86ca7756-940a-4d3c-affc-fd5e6a968f2a_2653x2653.jpeg"]]],"publisher" => ["@type" => "Organization","name" => "Privatdozent","url" => "https://www.privatdozent.co","description" => "Essays on the history of mathematics. 10k+ subscribers. Substack Bestseller (2024) 🥇, Grow Feature (2022) 📈, Featured Substack Newsletter (2021) 🌟","interactionStatistic" => ["@type" => "InteractionCounter","name" => "Subscribers","interactionType" => "https://schema.org/SubscribeAction","userInteractionCount" => 10000],"identifier" => "pub:14134","logo" => ["@type" => "ImageObject","url" => "https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0a20ecc-bc78-4d0a-bed0-7901fce9e3e8_1280x1280.png","contentUrl" => "https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0a20ecc-bc78-4d0a-bed0-7901fce9e3e8_1280x1280.png","thumbnailUrl" => "https://substackcdn.com/image/fetch/w_128,h_128,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0a20ecc-bc78-4d0a-bed0-7901fce9e3e8_1280x1280.png"],"image" => ["@type" => "ImageObject","url" => "https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0a20ecc-bc78-4d0a-bed0-7901fce9e3e8_1280x1280.png","contentUrl" => "https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0a20ecc-bc78-4d0a-bed0-7901fce9e3e8_1280x1280.png","thumbnailUrl" => "https://substackcdn.com/image/fetch/w_128,h_128,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0a20ecc-bc78-4d0a-bed0-7901fce9e3e8_1280x1280.png"],"sameAs" => ["https://twitter.com/dozentprivat"]]] ["JsonLdData" => ["@context" => "https://schema.org","@type" => "NewsArticle","url" => "https://www.privatdozent.co/p/the-battle-line-at-louvain-1914","mainEntityOfPage" => "https://www.privatdozent.co/p/the-battle-line-at-louvain-1914","headline" => "The Battle Line at Louvain (1914)","description" => "“Where they burn books, they will also burn people” — Heinrich Heine","image" => [["@type" => "ImageObject","url" => "https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca982839-4161-4d7b-90a3-9ff1bdeca5f0_1280x939.jpeg"]],"datePublished" => "2024-11-15T09:42:48+00:00","dateModified" => "2024-11-15T09:42:48+00:00","isAccessibleForFree" => true,"author" => [["@type" => "Person","name" => "Jørgen Veisdal","url" => "https://substack.com/@privatdozent","description" => "Author of Privatdozent. Associate Professor.","identifier" => "user:3088938","sameAs" => ["https://twitter.com/JorgenVeisdal"],"image" => ["@type" => "ImageObject","contentUrl" => "https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F86ca7756-940a-4d3c-affc-fd5e6a968f2a_2653x2653.jpeg","thumbnailUrl" => "https://substackcdn.com/image/fetch/w_128,h_128,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F86ca7756-940a-4d3c-affc-fd5e6a968f2a_2653x2653.jpeg"]]],"publisher" => ["@type" => "Organization","name" => "Privatdozent","url" => "https://www.privatdozent.co","description" => "Essays on the history of mathematics. 10k+ subscribers. Substack Bestseller (2024) 🥇, Grow Feature (2022) 📈, Featured Substack Newsletter (2021) 🌟","interactionStatistic" => ["@type" => "InteractionCounter","name" => "Subscribers","interactionType" => "https://schema.org/SubscribeAction","userInteractionCount" => 10000],"identifier" => "pub:14134","logo" => ["@type" => "ImageObject","url" => "https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0a20ecc-bc78-4d0a-bed0-7901fce9e3e8_1280x1280.png","contentUrl" => "https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0a20ecc-bc78-4d0a-bed0-7901fce9e3e8_1280x1280.png","thumbnailUrl" => "https://substackcdn.com/image/fetch/w_128,h_128,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0a20ecc-bc78-4d0a-bed0-7901fce9e3e8_1280x1280.png"],"image" => ["@type" => "ImageObject","url" => "https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0a20ecc-bc78-4d0a-bed0-7901fce9e3e8_1280x1280.png","contentUrl" => "https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0a20ecc-bc78-4d0a-bed0-7901fce9e3e8_1280x1280.png","thumbnailUrl" => "https://substackcdn.com/image/fetch/w_128,h_128,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0a20ecc-bc78-4d0a-bed0-7901fce9e3e8_1280x1280.png"],"sameAs" => ["https://twitter.com/dozentprivat"]]]]
15:13:45 INFO [graby] date matched from JsonLd: 2024-11-15T09:42:48+00:00 ["date" => "2024-11-15T09:42:48+00:00"]
15:13:45 INFO [graby] date matched from JsonLd: 2024-11-15T09:42:48+00:00 ["date" => "2024-11-15T09:42:48+00:00"]
15:13:45 INFO [graby] author matched from JsonLd: Jørgen Veisdal ["author" => "Jørgen Veisdal"]
15:13:45 INFO [graby] title matched from JsonLd: {The Battle Line at Louvain (1914)} ["title" => "The Battle Line at Louvain (1914)"]
15:13:45 INFO [graby] Trying //meta[@property="og:title"]/@content for title ["pattern" => "//meta[@property="og:title"]/@content"]
15:13:45 INFO [graby] title matched: The Battle Line at Louvain (1914) ["title" => "The Battle Line at Louvain (1914)"]
15:13:45 INFO [graby] ...XPath match: {pattern} ["pattern","//meta[@property="og:title"]/@content"]
15:13:45 INFO [graby] Trying //meta[@property="article:published_time"]/@content for date ["pattern" => "//meta[@property="article:published_time"]/@content"]
15:13:45 INFO [graby] Trying //html[@lang]/@lang for language ["pattern" => "//html[@lang]/@lang"]
15:13:45 INFO [graby] Trying //meta[@name="DC.language"]/@content for language ["pattern" => "//meta[@name="DC.language"]/@content"]
15:13:45 INFO [graby] Trying //*[contains(@class, 'google-dfp-ad-wrapper')] to strip element ["pattern" => "//*[contains(@class, 'google-dfp-ad-wrapper')]"]
15:13:45 INFO [graby] Trying //iframe/@srcdoc to strip element ["pattern" => "//iframe/@srcdoc"]
15:13:45 INFO [graby] Trying sharedaddy to strip element ["string" => "sharedaddy"]
15:13:45 INFO [graby] Trying i-amphtml-replaced-content to strip element ["string" => "i-amphtml-replaced-content"]
15:13:45 INFO [graby] Using Readability
In Readability.php line 268:
[DOMException (4)]
Wrong Document Error
Exception trace:
at /home/tac/g/sites/feeds/vendor/j0k3r/php-readability/src/Readability.php:268
DOMNode->appendChild() at /home/tac/g/sites/feeds/vendor/j0k3r/php-readability/src/Readability.php:268
Readability\Readability->init() at /home/tac/g/tacman/graby/src/Extractor/ContentExtractor.php:484
Graby\Extractor\ContentExtractor->process() at /home/tac/g/tacman/graby/src/Graby.php:352
Graby\Graby->doFetchContent() at /home/tac/g/tacman/graby/src/Graby.php:177
Graby\Graby->fetchContent() at /home/tac/g/sites/feeds/src/Parser/Internal.php:25
App\Parser\Internal->parse() at /home/tac/g/sites/feeds/src/Content/Extractor.php:117
App\Content\Extractor->parseContent() at /home/tac/g/sites/feeds/src/Content/Import.php:97
App\Content\Import->process() at /home/tac/g/sites/feeds/src/Command/FetchItemsCommand.php:155
App\Command\FetchItemsCommand->execute() at /home/tac/g/sites/feeds/vendor/symfony/console/Command/Command.php:279
Symfony\Component\Console\Command\Command->run() at /home/tac/g/sites/feeds/vendor/symfony/console/Application.php:1094
Symfony\Component\Console\Application->doRunCommand() at /home/tac/g/sites/feeds/vendor/symfony/framework-bundle/Console/Application.php:123
Symfony\Bundle\FrameworkBundle\Console\Application->doRunCommand() at /home/tac/g/sites/feeds/vendor/symfony/console/Application.php:342
Symfony\Component\Console\Application->doRun() at /home/tac/g/sites/feeds/vendor/symfony/framework-bundle/Console/Application.php:77
Symfony\Bundle\FrameworkBundle\Console\Application->doRun() at /home/tac/g/sites/feeds/vendor/symfony/console/Application.php:193
Symfony\Component\Console\Application->run() at /home/tac/g/sites/feeds/vendor/symfony/runtime/Runner/Symfony/ConsoleApplicationRunner.php:49
Symfony\Component\Runtime\Runner\Symfony\ConsoleApplicationRunner->run() at /home/tac/g/sites/feeds/vendor/autoload_runtime.php:29
require_once() at /home/tac/g/sites/feeds/c:11
feed:fetch-items [--slug [SLUG]] [--use_queue] [--] [<age>]
This is graby, calling this library, but I'm stuck and don't really understand DOM manipulation in PHP.
I'm running PHP 8.3, and I'm wondering it it's stricter about adding dom elements.
I made some progress by following https://stackoverflow.com/questions/1759137/domelement-cloning-and-appending-wrong-document-error
I'm not sure what I'm doing, though.
$node = $this->body->ownerDocument->importNode($overlay, true);
$this->body->appendChild($node);