Goutte icon indicating copy to clipboard operation
Goutte copied to clipboard

"The current node list is empty" issue

Open gnikolopoulos opened this issue 9 years ago • 8 comments

Hi all,

I am using Goutte to crawl a set of URLS that contain prices in order to extract data for our in-house ERP system. It works great, but some URLs give me this error:

Fatal error: Uncaught exception 'InvalidArgumentException' with message 'The current node list is empty.'

And although the following returns a correct count:

$crawler->filter('.classname')->count()

I cannot make this work. This only happens to 2-3 URLs, but I would like to fix this. Here is a piece of my code:

$c_data = $crawler->filter('.js-product-card')->each(function ($node) {
      $price = str_replace( ",", ".", $node->filter('.price > .product-link')->text() );
      $indexer++;
      return array( $node->filter('.shop-details > a')->text(), floatval( $price ) );
  });

And here are 2 sample URLs:

  1. http://www.skroutz.gr/s/8326592/Salomon-Sonic-Pro-379168.html
  2. http://www.skroutz.gr/s/7134104/Salomon-Speedcross-3-378337.html

Number 2 works just fine, but number 1 presents this issue.

The only difference I noticed is a <script> tag inside some of the nodes right before the error triggers. Apart from that, the HTML structure is the same. Can anyone help me figure this out?

gnikolopoulos avatar Jul 18 '16 16:07 gnikolopoulos

I have the same issue, how did you resolve it?

3zzy avatar Sep 04 '16 19:09 3zzy

Same here!

magicwarms avatar Aug 05 '17 14:08 magicwarms

Same problem!

GregDniPro avatar Jan 19 '18 11:01 GregDniPro

Same problem here. This url works: https://bwh1.net/cart.php?a=add&pid=56

But this does not: https://my.rfchost.com/cart.php?a=add&pid=97


include './vendor/autoload.php';

use Goutte\Client;
use GuzzleHttp\Client as GuzzleClient;
use GuzzleHttp\Psr7;

$client = new Client();
$guzzle = new GuzzleClient(array(
    'curl' => array(
        CURLOPT_TIMEOUT => 60,
        CURLOPT_SSL_VERIFYPEER => false
    )
));
$client->setClient($guzzle);

try
{
  #$crawler = $client->request('GET', 'https://bwh1.net/cart.php?a=add&pid=56');
  $crawler = $client->request('GET', 'https://my.rfchost.com/cart.php?a=add&pid=97');
  $result  = $crawler->filter('body')->text();

  echo "Begin fetch</br>";
}
catch (RequestException $e)
{
  echo Psr7\str($e->getRequest());
  if ($e->hasResponse())
    {
      echo "has exception</br>";
    }
  else
    {
    }
}

echo $result;
?>

yylzcom avatar Feb 15 '18 07:02 yylzcom

someone has resolved this issue?? i have the same problem...

sac325 avatar Jan 07 '19 13:01 sac325

Same here, can this somehow be resolved?

asopsec avatar Jan 11 '19 10:01 asopsec

I have the same problem here... any resolution???

aligorithm avatar Feb 01 '19 12:02 aligorithm

Same here.

BeauxConsunji avatar Jan 30 '20 13:01 BeauxConsunji