diffbot-php-client icon indicating copy to clipboard operation
diffbot-php-client copied to clipboard

Bizarre issue with Diffbot using guzzlehttp

Open jonathantullett opened this issue 8 years ago • 7 comments

I've created a Crawl API job which has a few hundred results. I'm trying to get the results using type:article (so $bot->search("type:article") with setNum to "all") and it's throwing an exception:

PHP Warning:  curl_multi_exec(): Unable to create temporary file, Check permissions in temporary files directory. in /home/tullettj/websites/core-code/lib/vendor/guzzlehttp/guzzle/src/Handler/CurlMultiHandler.php on line 106

Warning: curl_multi_exec(): Unable to create temporary file, Check permissions in temporary files directory. in /home/tullettj/websites/core-code/lib/vendor/guzzlehttp/guzzle/src/Handler/CurlMultiHandler.php on line 106
PHP Fatal error:  Uncaught GuzzleHttp\Exception\RequestException: cURL error 23: Failed writing body (2749 != 16384) (see http://curl.haxx.se/libcurl/c/libcurl-errors.html) in /home/tullettj/websites/core-code/lib/vendor/guzzlehttp/guzzle/src/Handler/CurlFactory.php:187
Stack trace:
#0 /home/tullettj/websites/core-code/lib/vendor/guzzlehttp/guzzle/src/Handler/CurlFactory.php(150): GuzzleHttp\Handler\CurlFactory::createRejection(Object(GuzzleHttp\Handler\EasyHandle), Array)
#1 /home/tullettj/websites/core-code/lib/vendor/guzzlehttp/guzzle/src/Handler/CurlFactory.php(103): GuzzleHttp\Handler\CurlFactory::finishError(Object(GuzzleHttp\Handler\CurlMultiHandler), Object(GuzzleHttp\Handler\EasyHandle), Object(GuzzleHttp\Handler\CurlFactory))
#2 /home/tullettj/websites/core-code/lib/vendor/guzzlehttp/guzzle/src/Handler/CurlMultiHandler.php(179): GuzzleHttp\Handler\CurlFactory::finish(Object(GuzzleHttp\Handler\CurlMultiHandler), Object(GuzzleHttp\Handler\EasyHandle), Object(GuzzleHttp\Handler\CurlFactory))
#3 /home/tullettj/websites/c in /home/tullettj/websites/core-code/lib/vendor/php-http/guzzle6-adapter/src/Promise.php on line 127

Fatal error: Uncaught GuzzleHttp\Exception\RequestException: cURL error 23: Failed writing body (2749 != 16384) (see http://curl.haxx.se/libcurl/c/libcurl-errors.html) in /home/tullettj/websites/core-code/lib/vendor/guzzlehttp/guzzle/src/Handler/CurlFactory.php:187
Stack trace:
#0 /home/tullettj/websites/core-code/lib/vendor/guzzlehttp/guzzle/src/Handler/CurlFactory.php(150): GuzzleHttp\Handler\CurlFactory::createRejection(Object(GuzzleHttp\Handler\EasyHandle), Array)
#1 /home/tullettj/websites/core-code/lib/vendor/guzzlehttp/guzzle/src/Handler/CurlFactory.php(103): GuzzleHttp\Handler\CurlFactory::finishError(Object(GuzzleHttp\Handler\CurlMultiHandler), Object(GuzzleHttp\Handler\EasyHandle), Object(GuzzleHttp\Handler\CurlFactory))
#2 /home/tullettj/websites/core-code/lib/vendor/guzzlehttp/guzzle/src/Handler/CurlMultiHandler.php(179): GuzzleHttp\Handler\CurlFactory::finish(Object(GuzzleHttp\Handler\CurlMultiHandler), Object(GuzzleHttp\Handler\EasyHandle), Object(GuzzleHttp\Handler\CurlFactory))
#3 /home/tullettj/websites/c in /home/tullettj/websites/core-code/lib/vendor/php-http/guzzle6-adapter/src/Promise.php on line 127

So I've played with the setNum values and 60 seems to be the magic number. If I query for 60 or less, it's fine, however if I go for 61 or above, it throws this exception.

Have you seen this before, @Swader? It's a bit of a head scratcher (I have ~2Gb free in the temporary files directory)

Thanks!

jonathantullett avatar Oct 08 '16 19:10 jonathantullett

I've run it with a few other searches and the values are arbitrary. I thought it may be memory_limit related, but the script's configured with a memory_limit of -1 (so, unlimited).

jonathantullett avatar Oct 08 '16 19:10 jonathantullett

That'll happen with large bodies :( See this and this.

Let me know if you manage to hack past it.

Swader avatar Oct 09 '16 13:10 Swader

@Swader I've been working around this so far by decreasing the number of results downloaded if there's an exception thrown.

However, I'm now starting to see it being thrown when only a single result (setNum(1)) is being requested. This is rather problematic. Can you think of any way around this, or do we just have to consider them bad searches?

jonathantullett avatar Apr 18 '17 06:04 jonathantullett

@jonathantullett I'm sorry about the delay, didn't see this until now - I'll play around with it when I find time. It's still related to the above links from what I can tell, so I'll just have to modify the underlying stack to the tac method without implicitly relying on Guzzle to handle everything and it should work. This would, however, increase dependency on curl. I'll think about the best solution for everyone.

Swader avatar May 05 '17 08:05 Swader

@jonathantullett to continue on our discussion from Support - how are you calling the hundreds of search calls? I think I may be misunderstanding what's going on, as I've been unable to reproduce the hung calls. Can you share your code?

Swader avatar Nov 30 '17 11:11 Swader

@swader this is a different issue. This one is replicated by trying to download setNum($XX) articles for a search (I use the min time on the search), and I see the problem on a number of searches - often related to the size of the pages being returned.

I’ll find a search which is showing the issue and post it later (not at home at the moment) but this is completely unrelated to the dangling HTTPS connection issue.

jonathantullett avatar Nov 30 '17 11:11 jonathantullett

No I know, I just had no other way to ping you here directly 😬 A new issue with the hung calls would be appreciated

Swader avatar Nov 30 '17 12:11 Swader