Requests icon indicating copy to clipboard operation
Requests copied to clipboard

Fixes #283 : Fixes CPU busy loop(s) seen in request_multiple.

Open TysonAndre opened this issue 8 years ago • 10 comments

This will require more testing across various libcurl versions. I doubt that the timeout is necessary for curl_multi_select (calls select() if it can), but leaving in one just in case of bugs, so that it will end.

  • Haven't thoroughly checked for relevant libcurl bugs.

Asynchronously wait for events with a short timeout if CURLM_CALL_MULTI_PERFORM

Fixes #283

TysonAndre avatar Jun 13 '17 15:06 TysonAndre

Codecov Report

Merging #284 into master will decrease coverage by 0.37%. The diff coverage is 33.33%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #284      +/-   ##
==========================================
- Coverage   92.11%   91.73%   -0.38%     
==========================================
  Files          21       21              
  Lines        1762     1767       +5     
==========================================
- Hits         1623     1621       -2     
- Misses        139      146       +7
Impacted Files Coverage Δ
library/Requests/Transport/cURL.php 91.94% <33.33%> (-1.75%) :arrow_down:
library/Requests/Proxy/HTTP.php 86.66% <0%> (-13.34%) :arrow_down:
library/Requests/Transport/fsockopen.php 94.76% <0%> (+0.58%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 87932f5...c7d8ee4. Read the comment docs.

codecov-io avatar Jun 14 '17 03:06 codecov-io

I'm encountering brand new issues due to https://github.com/php/php-src/pull/3509#issuecomment-430764448

On September 2018 for PHP 7.1.23, after I created this PR, curl_multi_select will now return -1 without setting any file descriptors in certain error modes. That's because it uses curl_multi_wait(), which has slightly different issues.

  • I'm using php >= 7.1.23 and curl >= 7.28.0. This issue would not be seen if either of those dependencies was less than that.

I think that the issue might be triggered by timeouts (e.g. connect timeouts on one request, while the other request doesn't timeout).

https://bugs.php.net/bug.php?id=76480 mentions the following:

Please note the patch changes the behaviour of curl_multi_select(). Previously curl_multi_select() would return -1 if cURL has no open file descriptors. In the new situation (using curl_multi_wait() with libcurl 7.28 or higher) curl_multi_select() will return 0 if cURL has no open file descriptors.

~One fix to consider would be to treat all remaining curl handles in the multi-handle as if they had timed out if curl_multi_select returned -1.~

EDIT: Sorry, that would break older versions or code using older libcurl because it returned -1 on timeout previously. https://bugs.php.net/bug.php?id=77030 mentions a better approach

	$Descriptors = curl_multi_select( $Master, 0.1 );

	if( $Descriptors === -1 && curl_multi_errno( $Master ) !== CURLM_OK ) {
  • ~It shouldn't be possible to return -1 in older php versions, because it looks like Requests properly checks if there are 1 or more remaining handles~
  • So if it does return -1 and curl_multi_errno isn't CURLM_OK, it's probably caused by a timeout, and should be handled by rmccue/Requests like that, throwing Timeout exceptions for the remaining connections

Note that I'm not a curl_multi expert (I had a bit of experience using it from university, but that's it), and there may be edge cases I haven't considered

TysonAndre avatar Oct 23 '19 15:10 TysonAndre

https://github.com/rmccue/Requests/pull/284#issuecomment-545495019 applied to an alternative version I was testing out locally without the usleep.

  • The usleep is unnecessary if curl_multi_select properly slept for 50 milliseconds while waiting for an event.
  • The usleep would mitigate CPU busy loop issues in newer php versions + curl versions. However, I think that request_multiple may still fail to return before/after my patch in edge cases, and adding $Descriptors === -1 && curl_multi_errno( $Master ) !== CURLM_OK may be needed

TysonAndre avatar Oct 23 '19 16:10 TysonAndre

The failure mode I'm seeing in the latest php versions may be related to my use of the NOSIGNAL patch, if nobody else sees it

https://github.com/rmccue/Requests/issues/264#issuecomment-308214611

TysonAndre avatar Oct 23 '19 16:10 TysonAndre

In order to reproduce the issue I saw, I did the following:

  • Configured two requests: one to localhost which would connect immediately, and wait 3000 milliseconds to respond, and another to a service that would refuse to connect.
  • Set a connect timeout of 3 milliseconds and a request timeout of 3000 milliseconds.
  • Performed both requests

https://gist.github.com/TysonAndre/46a6829d9d8de00e50226890a6a9a63f has the full logs and details about the options used, but the part of interest is that curl_multi_select isn't sleeping for nearly as long as I'd expect it to, and it is neither returning -1 nor returning an error in curl_multi_error.

curl_multi_exec   returned 0 after 0.000492 seconds: curl_multi_errno=0
curl_multi_select returned 0 after 0.000010 seconds: curl_multi_errno=0
curl_multi_info_read returned a value:
/path/to/vendor/rmccue/requests/library/Requests/Transport/cURL.php:247:
array(3) {
  'msg' =>
  int(1)
  'result' =>
  int(7)
  'handle' =>
  resource(316) of type (curl)
}
curl error name: Couldn't connect to server
curl_multi_exec   returned 0 after 0.000013 seconds: curl_multi_errno=0
curl_multi_select returned 0 after 0.000005 seconds: curl_multi_errno=0
curl_multi_exec   returned 0 after 0.000081 seconds: curl_multi_errno=0
curl_multi_select returned 0 after 0.000025 seconds: curl_multi_errno=0
curl_multi_exec   returned 0 after 0.000031 seconds: curl_multi_errno=0
curl_multi_select returned 0 after 0.000002 seconds: curl_multi_errno=0
curl_multi_exec   returned 0 after 0.000031 seconds: curl_multi_errno=0
curl_multi_select returned 0 after 0.000004 seconds: curl_multi_errno=0
curl_multi_exec   returned 0 after 0.000008 seconds: curl_multi_errno=0
... 5000 more lines of busy looping

The below link has the code changes I added to diagnose the issue and the (ineffective) mitigations I'm currently trying. A hardcoded usleep() of 1 millisecond would harm PHP response times if the service would normally complete in less than a millisecond

	/**
	 * Send multiple requests simultaneously
	 *
	 * @param array $requests Request data
	 * @param array $options Global options
	 * @return array Array of Requests_Response objects (may contain Requests_Exception or string responses as well)
	 */
	public function request_multiple($requests, $options) {
		// If you're not requesting, we can't get any responses ¯\_(ツ)_/¯
		if (empty($requests)) {
			return array();
		}

		$multihandle = curl_multi_init();
		$subrequests = array();
		$subhandles = array();

		$class = get_class($this);
		foreach ($requests as $id => $request) {
			$subrequests[$id] = new $class();
			$subhandles[$id] = $subrequests[$id]->get_subrequest_handle($request['url'], $request['headers'], $request['data'], $request['options']);
			$request['options']['hooks']->dispatch('curl.before_multi_add', array(&$subhandles[$id]));
			curl_multi_add_handle($multihandle, $subhandles[$id]);
		}

		$completed = 0;
		$responses = array();

		$request['options']['hooks']->dispatch('curl.before_multi_exec', array(&$multihandle));

		do {
			$active = false;

			$is_first_multi_exec = true;
			do {
				/*
				if (!$is_first_multi_exec) {
					// Sleep for 1 millisecond to avoid a busy loop that would chew CPU.
					// XXX uncomment this to mitigate this a bit - there are better mitigations such as making this sleep after curl_multi_select if it returns 0 or -1.
					usleep(1000);
				}
				 */
				$start = microtime(true);
				$status = curl_multi_exec($multihandle, $active);
				fprintf(STDERR, "curl_multi_exec   returned %s after %.6f seconds: curl_multi_errno=%s\n", json_encode($status), microtime(true) - $start, curl_multi_errno($multihandle));
				$is_first_multi_exec = false;
			}
			while ($status === CURLM_CALL_MULTI_PERFORM);

			// Insert a 50ms sleep here so that we don't busy loop and chew cpu
			$start = microtime(true);
			$selectRes  = curl_multi_select($multihandle, 2.0);
			fprintf(STDERR, "curl_multi_select returned %s after %.6f seconds: curl_multi_errno=%s\n", json_encode($selectRes), microtime(true) - $start, curl_multi_errno($multihandle));
			fflush(STDERR);

			$to_process = array();

			// Read the information as needed
			while ($done = curl_multi_info_read($multihandle)) {
				echo "curl_multi_info_read returned a value:\n";
				var_dump($done);
				echo "curl error name: " . curl_strerror($done['result']) . "\n";
				$key = array_search($done['handle'], $subhandles, true);
				if (!isset($to_process[$key])) {
					$to_process[$key] = $done;
				}
			}

			// Parse the finished requests before we start getting the new ones
			foreach ($to_process as $key => $done) {
				$options = $requests[$key]['options'];
				if (CURLE_OK !== $done['result']) {
					//get error string for handle.
					$reason = curl_error($done['handle']);
					$exception = new Requests_Exception_Transport_cURL(
									$reason,
									Requests_Exception_Transport_cURL::EASY,
									$done['handle'],
									$done['result']
								);
					$responses[$key] = $exception;
					$options['hooks']->dispatch('transport.internal.parse_error', array(&$responses[$key], $requests[$key]));
				}
				else {
					$responses[$key] = $subrequests[$key]->process_response($subrequests[$key]->response_data, $options);

					$options['hooks']->dispatch('transport.internal.parse_response', array(&$responses[$key], $requests[$key]));
				}

				curl_multi_remove_handle($multihandle, $done['handle']);
				curl_close($done['handle']);

				if (!is_string($responses[$key])) {
					$options['hooks']->dispatch('multiple.request.complete', array(&$responses[$key], $key));
				}
				$completed++;
			}
		}
		while ($active || $completed < count($subrequests));

		$request['options']['hooks']->dispatch('curl.after_multi_exec', array(&$multihandle));

		curl_multi_close($multihandle);

		return $responses;
	}

The endpoint I used to test that connected immediately but failed to respond, running on localhost:

<?php

sleep(3);
echo "Pretending to be slow to respond\n";

EDIT: This occurs whether or not I set CURLOPT_NOSIGNAL. I'm using a recent version of PHP 7.1 (>= 7.1.23 and curl >= 7.28.0.) EDIT: the busy loop only happens for a small fraction of requests under those conditions

TysonAndre avatar Oct 23 '19 18:10 TysonAndre

Codecov Report

Merging #284 into master will increase coverage by 0.01%. The diff coverage is 94.11%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master     #284      +/-   ##
============================================
+ Coverage     92.12%   92.13%   +0.01%     
- Complexity      761      764       +3     
============================================
  Files            21       21              
  Lines          1764     1781      +17     
============================================
+ Hits           1625     1641      +16     
- Misses          139      140       +1
Impacted Files Coverage Δ Complexity Δ
library/Requests/Transport/cURL.php 93.72% <94.11%> (+0.03%) 69 <0> (+3) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 58a0bbc...34684b1. Read the comment docs.

codecov-io avatar Oct 23 '19 19:10 codecov-io

It seems like the addition of usleep doesn't hurt short requests, but makes long requests take noticeably longer. I should only be performing the usleep if the number returned by curl_multi_select wasn't positive.

https://curl.haxx.se/libcurl/c/curl_multi_wait.html

On completion, if numfds is non-NULL, it will be populated with the total number of file descriptors on which interesting events occurred. This number can include both libcurl internal descriptors as well as descriptors provided in extra_fds.

The implementation in php-src for curl_multi_select:

#if LIBCURL_VERSION_NUM >= 0x071c00 /* Available since 7.28.0 */
	error = curl_multi_wait(mh->multi, NULL, 0, (unsigned long) timeout * 1000.0, &numfds);
	if (CURLM_OK != error) {
		SAVE_CURLM_ERROR(mh, error);
		RETURN_LONG(-1);
	}

	RETURN_LONG(numfds);
#else
	_make_timeval_struct(&to, timeout);

	FD_ZERO(&readfds);
	FD_ZERO(&writefds);
	FD_ZERO(&exceptfds);

	error = curl_multi_fdset(mh->multi, &readfds, &writefds, &exceptfds, &maxfd);
	SAVE_CURLM_ERROR(mh, error);

	if (maxfd == -1) {
		RETURN_LONG(-1);
	}
	RETURN_LONG(select(maxfd + 1, &readfds, &writefds, &exceptfds, &to));
#endif

TysonAndre avatar Oct 23 '19 20:10 TysonAndre

No further changes are planned. This seems to be working pretty well to mitigate edge cases causing cpu usage spikes in a fraction of requests.

  • After the latest commit, requestMultiple duration is back to what it was without the sleep for normal requests (durations ranging from 5-1000ms)
  • Overall CPU usage is slightly down on hosts that are using requestMultiple a lot (with php >= 7.1.23 and curl >= 7.28.0). CPU usage stops spiking when the request duration of the external service spikes
  • The measured number of calls to curl_multi_exec is a lot lower.

TysonAndre avatar Oct 23 '19 21:10 TysonAndre

@TysonAndre Hiya! Sorry for the long wait, but we're finally trying to get through the list of open PRs.

We're currently planning two triage sessions for Requests 2.0 and would like to invite you to join us in one or both of these sessions to talk us through the PR and discuss it.

The triage sessions are currently planned for:

  • Friday August 20, 07:00 - 11:00 UTC
  • Friday September 3, 07:00 - 11:00 UTC

Would you be available to join us during those times on one of those days ?

Also: would you be willing and able to add some tests for these changes to the PR ?

jrfnl avatar Aug 13 '21 10:08 jrfnl

PR #493 is possibly a partial duplicate of this PR.

jrfnl avatar Aug 13 '21 11:08 jrfnl