php-github-api icon indicating copy to clipboard operation
php-github-api copied to clipboard

ResultPager doesn't merge data correctly for all API's

Open fideloper opened this issue 3 years ago • 9 comments

Hello!

The Problem (Description)

This is technically a duplicate of #597 from 2017, but I wanted to add in details and present some possible solutions.

We're using this library on ChipperCI, some customers noted that we aren't correctly listing all of the repositories our GitHub App is installed into when they have a large number of repositories.

Digging in, we realized only the last page of results were being returned back when using the ResultPager::fetchAll() method on results that had more than one page.

This only happens on certain API calls (I've only seen it on the Apps API so far - listed below).

The Issue

This is due to the use of array_merge() to merge resulting arrays back:

https://github.com/KnpLabs/php-github-api/blob/2.x/lib/Github/ResultPager.php#L92-L96:

while ($this->hasNext()) {
    $next = $this->fetchNext();

    if ($isSearch) {
        $result = array_merge($result, $next['items']);
    } else {
        $result = array_merge($result, $next); // <-- Specifically here
    }
}

Here's an example call into the App's api that results in the issue:

$github = new Client(new Builder, 'machine-man-preview');
$github->authenticate($userOAuthToken, Client::AUTH_HTTP_TOKEN);
$resultFetcher = new ResultPager($github);
$installations = $resultFetcher->fetchAll($github->currentUser(), 'installations');

A result object from one API call is (I set it to 1 per page to test this out, but it happens when there are more than 100 results for any API with results like this) - converted from JSON to PHP:

// API call number one, 1 per page, when there are 2 results total:
$result = [
    'total_count' => 2,
    'installations' => [
        ['id' => 12345 /* and so on */],
    ]
]

// API call number two, again using 1 per page, when there are 2 results total:
$next = [
    'total_count' => 2,
    'installations' => [
        ['id' => 678910 /* and so on */],
    ]
]

If you array_merge() these, PHP will over-write the the first array with the second, as their associative keys match!

$finalResult = array_merge($result, $next);

// since the arrays have matching keys, the $next array over-writes everything in the $result array
// and we effectively only get the last page of results for any set of results over the 100 results per page 
// limit set by GitHub's API
$finalResult === $next; // true
array_diff($finalResult, $next); // yields an empty result [], they are the same

Possible Solutions

A first idea I had:

The solution may be API class extending AbstractAPI to know if there is a key to be used there.

So far I can only find examples from the App's api, and from searching the source of that page, only these 3 endpoints appear to use total_count (and the other key for repositories / installations) in the returned results:

  1. https://docs.github.com/en/rest/reference/apps#list-repositories-accessible-to-the-user-access-token
    • e.g. GET /user/installations/{installation_id}/repositories
  2. https://docs.github.com/en/rest/reference/apps#list-repositories-accessible-to-the-app-installation
    • e.g. GET /installation/repositories
  3. https://docs.github.com/en/rest/reference/apps#list-app-installations-accessible-to-the-user-access-token
    • e.g. GET /user/installations

However, I've checked some GitHub issues here that hint that this may be an issue elsewhere (and actually, this bug is open too: https://github.com/KnpLabs/php-github-api/issues/597).

The keys in the $result / $next array are repositories or installations.

So perhaps something like this:

// File: AbstractAPI

protected $resultKey;

public function getResultKey()
{
    return $this->resultKey;
}

And then in ResultPager:

while ($this->hasNext()) {
    $next = $this->fetchNext();

    if ($isSearch) {
        $result = array_merge($result, $next['items']);
    } elseif (! $this->getResultKey()) {
        $result = array_merge($result, $next[$this->getResultKey()]);
    } else {
        $result = array_merge($result, $next); 
    }
}

This would likely need to be used not just for certain API's (e.g. the App's api) but the specific endpoints that need it 😅


Hopefully these extra details can help pin-point the issue! In the meantime, I'll do a similar work-around as the others in this (duplicate but less detailed) issue.

Let me know if I can clarify anything!

fideloper avatar Jul 23 '20 01:07 fideloper

@GrahamCampbell This could effect https://github.com/KnpLabs/php-github-api/pull/907 as well. Let me know what you think!

fideloper avatar Jul 23 '20 02:07 fideloper

I think we should report this as a bug to GitHub actually. It looks like the developer who implemented the installations API was not aware of their conventions.

GrahamCampbell avatar Jul 23 '20 08:07 GrahamCampbell

Inline hack that I used to get fetchAll working for the listRepositories command.

Mileage may vary.

$pager = new class($client) extends ResultPager {
    protected function callApi(ApiInterface $api, $method, array $parameters)
    {
        return parent::callApi($api, $method, $parameters)['repositories'];
    }
    public function fetchNext()
    {
        return parent::fetchNext()['repositories'];
    }
};

rtek avatar Aug 03 '20 18:08 rtek

Right on. I did the following since I have one call that needs repositories and one that needs installations:

<?php

namespace App\Api\GitHub;


use Github\Api\ApiInterface;
use Github\Api\Search;
use Github\ResultPager as BaseResultPager;

class ResultPager extends BaseResultPager
{
    protected $resultKey;

    /**
     * @param $key
     * @return $this
     */
    public function setResultKey($key)
    {
        $this->resultKey = $key;
        return $this;
    }

    /**
     * {@inheritdoc}
     */
    public function fetchAll(ApiInterface $api, $method, array $parameters = [])
    {
        $isSearch = $api instanceof Search;

        // get the perPage from the api
        $perPage = $api->getPerPage();

        // set parameters per_page to GitHub max to minimize number of requests
        $api->setPerPage(100);

        try {
            $result = $this->callApi($api, $method, $parameters);
            $this->postFetch();

            if ($this->resultKey) {
                $result = $result[$this->resultKey];
            }

            if ($isSearch) {
                $result = isset($result['items']) ? $result['items'] : $result;
            }

            while ($this->hasNext()) {
                $next = $this->fetchNext();

                if ($isSearch) {
                    $result = array_merge($result, $next['items']);
                } elseif ($this->resultKey) {
                    $result = array_merge($result, $next[$this->resultKey]);
                } else {
                    $result = array_merge($result, $next);
                }
            }
        } finally {
            // restore the perPage
            $api->setPerPage($perPage);
        }

        return $result;
    }
}

Example call to this result fetcher:

$github = new Client(new Builder, 'machine-man-preview');
$github->authenticate($oAuthToken, Client::AUTH_HTTP_TOKEN);
$resultFetcher = new GitHubResultPager($github);

$resultFetcher
    ->setResultKey('installations')
    ->fetchAll($github->currentUser(), 'installations');

fideloper avatar Aug 03 '20 18:08 fideloper

@GrahamCampbell I opened a ticket with GitHub but haven't heard a thing back :D

fideloper avatar Aug 03 '20 18:08 fideloper

From GitHub:

Hi there Chris,

Thanks for writing in to ask about this - and I'm sorry for the time we've kept you waiting for a response.

Returning those results in a nested array with a total_count is the intended behaviour for these endpoints. This isn't the only endpoint to behave this way, you'll notice that our Search API also returns a nested array of results.

We've used this format on some endpoints where it's particuarly useful to obtain a total count. It would be a breaking change to roll this out to more endpoints which does mean things can be a bit inconsistent.

I can understand how this could cause trouble with libraries that attempt to treat all REST API responses the same but it won't be possible to change the format of these responses. I hope this answers your question and helps move the discussion forward in the issue you linked.

Best regards, Steve

fideloper avatar Aug 25 '20 13:08 fideloper

Just updated this library and found that the previous solution posted by @fideloper doesn't work anymore.

Here is an updated class for version 3.2.0 in case somebody will need it too:

<?php

namespace App\Api\Github;

use Github\Api\AbstractApi;
use Github\ResultPager as BaseResultPager;

class ResultPager extends BaseResultPager
{
    private $resultKey;

    public function setResultKey($key)
    {
        $this->resultKey = $key;
        return $this;
    }

    public function fetch(AbstractApi $api, string $method, array $parameters = []): array
    {
        return $this->prepareResult(parent::fetch($api, $method, $parameters));
    }

    protected function get(string $key): array
    {
        return $this->prepareResult(parent::get($key));
    }

    private function prepareResult(array $result): array
    {
        if ($this->resultKey && $result) {
            $result['items'] = $result[$this->resultKey] ?? [];
        }

        return $result;
    }
}

Usage:

$repositories = (new ResultPager($client))
    ->setResultKey('repositories')
    ->fetchAll($client->apps(), 'listRepositories');

vovayatsyuk avatar May 06 '21 19:05 vovayatsyuk

Just updated this library and found that the previous solution posted by @fideloper doesn't work anymore.

Here is an updated class for version 3.2.0 in case somebody will need it too:

<?php

namespace App\Api\Github;

use Github\Api\AbstractApi;
use Github\ResultPager as BaseResultPager;

class ResultPager extends BaseResultPager
{
    private $resultKey;

    public function setResultKey($key)
    {
        $this->resultKey = $key;
        return $this;
    }

    public function fetch(AbstractApi $api, string $method, array $parameters = []): array
    {
        return $this->prepareResult(parent::fetch($api, $method, $parameters));
    }

    protected function get(string $key): array
    {
        return $this->prepareResult(parent::get($key));
    }

    private function prepareResult(array $result): array
    {
        if ($this->resultKey && $result) {
            $result['items'] = $result[$this->resultKey] ?? [];
        }

        return $result;
    }
}

Usage:

$repositories = (new ResultPager($client))
    ->setResultKey('repositories')
    ->fetchAll($client->apps(), 'listRepositories');

Thank you! it works like a charm.

Necmttn avatar Jun 03 '21 04:06 Necmttn

This is still an "issue" as of tag 3.5.1

The Problem

The API results (when I get a list of installations and repositories) are:

$github = new Client(new Builder, 'machine-man-preview');
$github->authenticate('<user oauth token>', AuthMethod::ACCESS_TOKEN);
$resultFetcher = new ResultPager($github);

$installations = $resultFetcher->fetchAll($github->currentUser(), 'installations');

// RESULTS LOOK LIKE THIS:
//[
//    'total_count' => 100,
//    'installations' => [...],
//]


// Get GitHub App token for JWT auth
$token = AppsFacade::newInstallationToken($installation['id']);

$github = new Client(new Builder, 'machine-man-preview');
$github->authenticate($token['token'], AuthMethod::JWT);
$resultFetcher = new ResultPager($github);

$response = $resultFetcher->fetchAll($github->apps(), 'listRepositories');

// RESULTS LOOK LIKE THIS:
//[
//    'total_count' => 100,
//    'repositories' => [...],
//]

➡️ The way this iterates over results only returns the last page!

Essentially code return iterator_to_array($this->fetchAllLazy($api, $method, $parameters)); yields those results. Each iteration yielding the results over-write the previous ones, as its return an array with keys total_count and repositories each iteration.

It doesn't seem to merge results!

Latest Hacky Solution

So, I've extended the class once again, here it is!

🔴 This iteration assumes the method setResultKey() is always used

<?php

namespace App\Api\GitHub;

use Generator;
use Github\Api\AbstractApi;
use Github\ResultPager as BasePager;

class ResultPager extends BasePager
{
    private $resultKey;

    /**
     * @param $key
     * @return $this
     */
    public function setResultKey($key)
    {
        $this->resultKey = $key;
        return $this;
    }

    /**
     * {@inheritdoc}
     */
    public function fetchAllLazy(AbstractApi $api, string $method, array $parameters = []): Generator
    {
        $result = $this->fetch($api, $method, $parameters);

        foreach ($result[$this->resultKey] as $key => $item) {
            if (is_string($key)) {
                yield $key => $item;
            } else {
                yield $item;
            }
        }

        while ($this->hasNext()) {
            $result = $this->fetchNext();

            foreach ($result[$this->resultKey] as $key => $item) {
                if (is_string($key)) {
                    yield $key => $item;
                } else {
                    yield $item;
                }
            }
        }
    }
}

Second Solution

Another Laravel-centric solution that does not involve over-riding the ResultPager class is to use a LazyCollection.

use Illuminate\Support\LazyCollection;

$github = new Client(new Builder, 'machine-man-preview');
$github->authenticate('<oauth token here>', AuthMethod::ACCESS_TOKEN);
$resultFetcher = new ResultPager($github);

$allInstallations = LazyCollection::make(function () use ($github, $resultFetcher) {
    yield $resultFetcher->fetch($github->currentUser(), 'installations');

    while ($resultFetcher->hasNext()) {
        yield $resultFetcher->fetchNext();
    }
})->reduce(function ($results, $result) {
    $results['installations'] = array_merge($results['installations'] ?? [], $result['installations']);

    return $results;
}, []);

foreach($allInstallations['installations'] as $installation) {...}

h/t to @tlaverdure 🍻

fideloper avatar Mar 08 '22 19:03 fideloper