GraphCollectionRequest getPage stuck
I have this strange issue with the GraphCollectionRequest getPage method. 99% of the time it works correctly, but every once in a while, it gets "stuck". The script keeps running, but it does not progress, until I forcefully close it.
The code in question:
$fetcher = $this->client
->createCollectionRequest("GET", "/me/drives/$driveId/items/$currentId/children")
->setPageSize(999)
->setReturnType(DriveItem::class);
trigger_error("getFolder8", E_USER_NOTICE);
$children = [];
while (!$fetcher->isEnd()) {
trigger_error("getFolder9", E_USER_NOTICE);
$page = $fetcher->getPage();
if (is_array($page)) {
$children = array_merge($children, $page);
} else {
break;
}
}
trigger_error("getFolder10", E_USER_NOTICE);
What I would expect to happen is that I first see getFolder8 in my logging, the getFolder9 (1+ times), and then getFolder10. What actually shows up in my logging is this:
[10-Dec-2020 20:16:45 UTC] PHP Notice: getFolder8 in ...
[10-Dec-2020 20:16:45 UTC] PHP Notice: getFolder9 in ...
and then nothing for ~4 hours, until the PHP process is forcefully killed.
It seems to me that the first $fetcher->getPage() somehow gets stuck.
The log does not contain any other error messages.
I would assume this is caused by the wait on the async request that $fetcher->getPage() does in the background, but I am not certain, and I cannot reliably reproduce the issue either, although it does happen a couple of times per day for different Microsoft accounts.
AB#6919
Additionally, I now have one case where it happened at a different point, namely at
$drives = $this->client->createRequest("GET", "/me/drives")->setReturnType(Drive::class)->execute();
I would assume that this is caused by https://github.com/microsoftgraph/msgraph-sdk-php/blob/dev/src/Http/GraphRequest.php#L265 never returning.
@Koenvh1 I think what may be the issue is that 1) the default timeout is set to 0 which will wait indefinitely, 2) I've witnessed dropped connections with Graph. Between the two, I think this explains this scenario.
Can you start by adding a timeout on the Request? setTimeout(30) to set a timeout of 30 seconds (that is probably too long) so at least this fails faster. Then, I think if we expose a getNextLink(), we can use $nextlink = $fetcher->getNextLink() to access the nextlink. Then you'd need to handle a GuzzleHttp\Exception\ConnectException, and then retry with the nextlink. Does this seem reasonable to you?
What we really need is a resumable PageIterator.
That might very well be the case. I have added a timeout to all calls, and I am monitoring whether the issue pops up again. The duration of the timeout is not an issue - it's mainly used for a long-running process that normally takes hours anyway, so an additional 30 seconds is no issue.
Note to self: the short-term action is to expose nextlink. Long-term is to provide a PageIterator.
I'm not sure if this is the same or different issue. From time to time, the following code "freezes", "gets stuck", "hangs". The simple GET \me query works fine. When I try to set up an upload session, it hangs, even with a ->setTimeout(10).
$graph = new Graph();
$graph->setAccessToken($access_token);
$user = $graph->createRequest("GET", "/me")
->setReturnType(Model\User::class)
->execute();
log_info("Logged in as {$user->getGivenName()}.") ; // THIS WORKS
/** @var Model\UploadSession $uploadSession */
$uploadSession = $graph->createRequest("POST", "/drives/$drive_id/items/root:/$saveAsFileName:/createUploadSession")
->addHeaders(["Content-Type" => "application/json"])
->setTimeout(10)
->attachBody([
"item" => [
"@microsoft.graph.conflictBehavior" => "rename",
"description" => $fileDescription
]
])
->setReturnType(Model\UploadSession::class)
->execute();
After looking at this again, I disagree with my earlier self. I think I've witnessed this in other SDKs where many requests are made and at some point, without consistent repro, the client doesn't return, and "hangs" or is in stuck condition.
It might be useful to get a dump of the open connections at the time errors like this occur.
We should validate the client's behavior when connections are dropped from the service API, also what happens if the response is never returned and the connection left open.
Unable to reproduce the client hanging scenario on the v1 SDK.
Our next V2 preview this month will contain a PageIterator that throws on any HTTP exceptions. Should this continue to happen, don't hesitate to reopen this.