php-readability icon indicating copy to clipboard operation
php-readability copied to clipboard

wrong document type error

Open tacman opened this issue 1 year ago • 1 comments

I'm trying to run against the link https://www.privatdozent.co/p/the-battle-line-at-louvain-1914, and getting an error.

15:13:45 INFO      [graby] Opengraph "article:" data: [] ["ogData" => []]
15:13:45 INFO      [graby] JSON-LD data: ["@context" => "https://schema.org","@type" => "NewsArticle","url" => "https://www.privatdozent.co/p/the-battle-line-at-louvain-1914","mainEntityOfPage" => "https://www.privatdozent.co/p/the-battle-line-at-louvain-1914","headline" => "The Battle Line at Louvain (1914)","description" => "“Where they burn books, they will also burn people” — Heinrich Heine","image" => [["@type" => "ImageObject","url" => "https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca982839-4161-4d7b-90a3-9ff1bdeca5f0_1280x939.jpeg"]],"datePublished" => "2024-11-15T09:42:48+00:00","dateModified" => "2024-11-15T09:42:48+00:00","isAccessibleForFree" => true,"author" => [["@type" => "Person","name" => "Jørgen Veisdal","url" => "https://substack.com/@privatdozent","description" => "Author of Privatdozent. Associate Professor.","identifier" => "user:3088938","sameAs" => ["https://twitter.com/JorgenVeisdal"],"image" => ["@type" => "ImageObject","contentUrl" => "https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F86ca7756-940a-4d3c-affc-fd5e6a968f2a_2653x2653.jpeg","thumbnailUrl" => "https://substackcdn.com/image/fetch/w_128,h_128,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F86ca7756-940a-4d3c-affc-fd5e6a968f2a_2653x2653.jpeg"]]],"publisher" => ["@type" => "Organization","name" => "Privatdozent","url" => "https://www.privatdozent.co","description" => "Essays on the history of mathematics. 10k+ subscribers. Substack Bestseller (2024) 🥇, Grow Feature (2022) 📈, Featured Substack Newsletter (2021) 🌟","interactionStatistic" => ["@type" => "InteractionCounter","name" => "Subscribers","interactionType" => "https://schema.org/SubscribeAction","userInteractionCount" => 10000],"identifier" => "pub:14134","logo" => ["@type" => "ImageObject","url" => "https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0a20ecc-bc78-4d0a-bed0-7901fce9e3e8_1280x1280.png","contentUrl" => "https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0a20ecc-bc78-4d0a-bed0-7901fce9e3e8_1280x1280.png","thumbnailUrl" => "https://substackcdn.com/image/fetch/w_128,h_128,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0a20ecc-bc78-4d0a-bed0-7901fce9e3e8_1280x1280.png"],"image" => ["@type" => "ImageObject","url" => "https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0a20ecc-bc78-4d0a-bed0-7901fce9e3e8_1280x1280.png","contentUrl" => "https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0a20ecc-bc78-4d0a-bed0-7901fce9e3e8_1280x1280.png","thumbnailUrl" => "https://substackcdn.com/image/fetch/w_128,h_128,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0a20ecc-bc78-4d0a-bed0-7901fce9e3e8_1280x1280.png"],"sameAs" => ["https://twitter.com/dozentprivat"]]] ["JsonLdData" => ["@context" => "https://schema.org","@type" => "NewsArticle","url" => "https://www.privatdozent.co/p/the-battle-line-at-louvain-1914","mainEntityOfPage" => "https://www.privatdozent.co/p/the-battle-line-at-louvain-1914","headline" => "The Battle Line at Louvain (1914)","description" => "“Where they burn books, they will also burn people” — Heinrich Heine","image" => [["@type" => "ImageObject","url" => "https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca982839-4161-4d7b-90a3-9ff1bdeca5f0_1280x939.jpeg"]],"datePublished" => "2024-11-15T09:42:48+00:00","dateModified" => "2024-11-15T09:42:48+00:00","isAccessibleForFree" => true,"author" => [["@type" => "Person","name" => "Jørgen Veisdal","url" => "https://substack.com/@privatdozent","description" => "Author of Privatdozent. Associate Professor.","identifier" => "user:3088938","sameAs" => ["https://twitter.com/JorgenVeisdal"],"image" => ["@type" => "ImageObject","contentUrl" => "https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F86ca7756-940a-4d3c-affc-fd5e6a968f2a_2653x2653.jpeg","thumbnailUrl" => "https://substackcdn.com/image/fetch/w_128,h_128,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F86ca7756-940a-4d3c-affc-fd5e6a968f2a_2653x2653.jpeg"]]],"publisher" => ["@type" => "Organization","name" => "Privatdozent","url" => "https://www.privatdozent.co","description" => "Essays on the history of mathematics. 10k+ subscribers. Substack Bestseller (2024) 🥇, Grow Feature (2022) 📈, Featured Substack Newsletter (2021) 🌟","interactionStatistic" => ["@type" => "InteractionCounter","name" => "Subscribers","interactionType" => "https://schema.org/SubscribeAction","userInteractionCount" => 10000],"identifier" => "pub:14134","logo" => ["@type" => "ImageObject","url" => "https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0a20ecc-bc78-4d0a-bed0-7901fce9e3e8_1280x1280.png","contentUrl" => "https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0a20ecc-bc78-4d0a-bed0-7901fce9e3e8_1280x1280.png","thumbnailUrl" => "https://substackcdn.com/image/fetch/w_128,h_128,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0a20ecc-bc78-4d0a-bed0-7901fce9e3e8_1280x1280.png"],"image" => ["@type" => "ImageObject","url" => "https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0a20ecc-bc78-4d0a-bed0-7901fce9e3e8_1280x1280.png","contentUrl" => "https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0a20ecc-bc78-4d0a-bed0-7901fce9e3e8_1280x1280.png","thumbnailUrl" => "https://substackcdn.com/image/fetch/w_128,h_128,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0a20ecc-bc78-4d0a-bed0-7901fce9e3e8_1280x1280.png"],"sameAs" => ["https://twitter.com/dozentprivat"]]]]
15:13:45 INFO      [graby] date matched from JsonLd: 2024-11-15T09:42:48+00:00 ["date" => "2024-11-15T09:42:48+00:00"]
15:13:45 INFO      [graby] date matched from JsonLd: 2024-11-15T09:42:48+00:00 ["date" => "2024-11-15T09:42:48+00:00"]
15:13:45 INFO      [graby] author matched from JsonLd: Jørgen Veisdal ["author" => "Jørgen Veisdal"]
15:13:45 INFO      [graby] title matched from JsonLd: {The Battle Line at Louvain (1914)} ["title" => "The Battle Line at Louvain (1914)"]
15:13:45 INFO      [graby] Trying //meta[@property="og:title"]/@content for title ["pattern" => "//meta[@property="og:title"]/@content"]
15:13:45 INFO      [graby] title matched: The Battle Line at Louvain (1914) ["title" => "The Battle Line at Louvain (1914)"]
15:13:45 INFO      [graby] ...XPath match: {pattern} ["pattern","//meta[@property="og:title"]/@content"]
15:13:45 INFO      [graby] Trying //meta[@property="article:published_time"]/@content for date ["pattern" => "//meta[@property="article:published_time"]/@content"]
15:13:45 INFO      [graby] Trying //html[@lang]/@lang for language ["pattern" => "//html[@lang]/@lang"]
15:13:45 INFO      [graby] Trying //meta[@name="DC.language"]/@content for language ["pattern" => "//meta[@name="DC.language"]/@content"]
15:13:45 INFO      [graby] Trying //*[contains(@class, 'google-dfp-ad-wrapper')] to strip element ["pattern" => "//*[contains(@class, 'google-dfp-ad-wrapper')]"]
15:13:45 INFO      [graby] Trying //iframe/@srcdoc to strip element ["pattern" => "//iframe/@srcdoc"]
15:13:45 INFO      [graby] Trying sharedaddy to strip element ["string" => "sharedaddy"]
15:13:45 INFO      [graby] Trying i-amphtml-replaced-content to strip element ["string" => "i-amphtml-replaced-content"]
15:13:45 INFO      [graby] Using Readability

In Readability.php line 268:
                        
  [DOMException (4)]    
  Wrong Document Error  
                        

Exception trace:
  at /home/tac/g/sites/feeds/vendor/j0k3r/php-readability/src/Readability.php:268
 DOMNode->appendChild() at /home/tac/g/sites/feeds/vendor/j0k3r/php-readability/src/Readability.php:268
 Readability\Readability->init() at /home/tac/g/tacman/graby/src/Extractor/ContentExtractor.php:484
 Graby\Extractor\ContentExtractor->process() at /home/tac/g/tacman/graby/src/Graby.php:352
 Graby\Graby->doFetchContent() at /home/tac/g/tacman/graby/src/Graby.php:177
 Graby\Graby->fetchContent() at /home/tac/g/sites/feeds/src/Parser/Internal.php:25
 App\Parser\Internal->parse() at /home/tac/g/sites/feeds/src/Content/Extractor.php:117
 App\Content\Extractor->parseContent() at /home/tac/g/sites/feeds/src/Content/Import.php:97
 App\Content\Import->process() at /home/tac/g/sites/feeds/src/Command/FetchItemsCommand.php:155
 App\Command\FetchItemsCommand->execute() at /home/tac/g/sites/feeds/vendor/symfony/console/Command/Command.php:279
 Symfony\Component\Console\Command\Command->run() at /home/tac/g/sites/feeds/vendor/symfony/console/Application.php:1094
 Symfony\Component\Console\Application->doRunCommand() at /home/tac/g/sites/feeds/vendor/symfony/framework-bundle/Console/Application.php:123
 Symfony\Bundle\FrameworkBundle\Console\Application->doRunCommand() at /home/tac/g/sites/feeds/vendor/symfony/console/Application.php:342
 Symfony\Component\Console\Application->doRun() at /home/tac/g/sites/feeds/vendor/symfony/framework-bundle/Console/Application.php:77
 Symfony\Bundle\FrameworkBundle\Console\Application->doRun() at /home/tac/g/sites/feeds/vendor/symfony/console/Application.php:193
 Symfony\Component\Console\Application->run() at /home/tac/g/sites/feeds/vendor/symfony/runtime/Runner/Symfony/ConsoleApplicationRunner.php:49
 Symfony\Component\Runtime\Runner\Symfony\ConsoleApplicationRunner->run() at /home/tac/g/sites/feeds/vendor/autoload_runtime.php:29
 require_once() at /home/tac/g/sites/feeds/c:11

feed:fetch-items [--slug [SLUG]] [--use_queue] [--] [<age>]

This is graby, calling this library, but I'm stuck and don't really understand DOM manipulation in PHP.

I'm running PHP 8.3, and I'm wondering it it's stricter about adding dom elements.

tacman avatar Nov 15 '24 15:11 tacman

I made some progress by following https://stackoverflow.com/questions/1759137/domelement-cloning-and-appending-wrong-document-error

I'm not sure what I'm doing, though.

            $node = $this->body->ownerDocument->importNode($overlay, true);
            $this->body->appendChild($node);

tacman avatar Nov 15 '24 15:11 tacman