go-github icon indicating copy to clipboard operation
go-github copied to clipboard

Feature Request: Pagination Handlers

Open xorima opened this issue 2 years ago • 23 comments

Hey 👋

I notice how most resources already have a way of handling pagination but we leave it to the users to implement the pagination methods themseleves.

Would it be to use to have additional appropriate functions/methods to handle the pagination for the user, so they can just focus on the data they want to get back and not how to handle the pagination. (potentially with some offsets/config for max objects returned)

I'm happy to code this up, just after feedback if this is a feature people want

xorima avatar Jan 05 '23 06:01 xorima

Hi @Xorima !

Yes, there is interest - see #1935 - but note that that issue was closed due to lack of volunteers. :smile:

Also note that many endpoints handle pagination differently due to the organic growth of the GitHub v3 API.

If you could come up with a way to support paginating through all results that supports all the different kinds of pagination schemes, that would be great.

However, there is also one slightly tricky part - spoken from experience - and that is that you MUST respect the rate limits (or risk your access token being blocked for a period of "cool off" time). This means that the rate limit values must be monitored during your pagination and you must ensure that you let your pagination loop cool down if you get close to exceeding the rate limits.

With all that in mind, you are welcome to put together a PR and we can explore this topic further.

Thank you, @Xorima !

gmlewis avatar Jan 05 '23 12:01 gmlewis

Adding complexity to the discussion:

  1. Configurability:
    • One should need to opt into the auto-pagination feature on a per-request basis (perhaps via the opts argument) and optionally also as a default behavior for the client.
    • One should still be able to pass the per-page value on a per-request basis.
  2. Sync vs Async: One might prefer a blocking (wait) or non-blocking (return) approach for pagination. The async approach is essential here because one might want to start processing results before the end of the pagination.
    • either way: should return the results found up to the rate limit, and return a meaningful error to the caller (so the user can use the results gathered so far and also continue the request from the same point if they decide).
  3. Rate limiting: there's currently no cooldown support in the client. If such a behavior is implemented for pagination, it should also be made configurable (block=cooldown or non-block=return).
    • Blocking: one might want to wait only if the wait time is below a threshold. Especially relevant for secondary rate limits (one might not want to wait an hour or so but would wait minutes).
  4. Return value: the pagination-related response fields should reflect the state of the cursor.

IMO #1 is a must-have for the feature (to avoid breaking backward compatibility), whilst #2 and #3 aren't, but would make the built-in pagination feature relevant only for some specific use cases.

Finally, sharing a handler that can be used on a per-request basis (given the current status - without the pagination feature):

func PaginateResults(api func(opts *github.ListOptions) (*github.Response, error)) error {
	var opts github.ListOptions

	for {
		resp, err := api(&opts)

		if err != nil {
			return err
		}

		if resp.NextPage == 0 {
			return nil
		}

		opts.Page = resp.NextPage
	}
}

which is then used like that:

	err := PaginateResults(func(opts *github.ListOptions) (*github.Response, error) {
	        // optionally edit opts to change the per-page value
	        // note that you can change PaginateResults to receive per-page as a parameter and set it there
	        
		res, resp, err := client.SomeCall(A, B, C, opts)
		if err != nil {
			return resp, err
		}

		go func() {
		   // do something with res
		}

		return resp, nil
	})

p.s. @gofri is my personal account.

gal-legit avatar Jan 18 '23 12:01 gal-legit

Thanks for the criteria @gal-legit, I think it's very solid.

My initial idea was to have a separate function to call to have it auto handle pagiantion, similar to how the go-tfe module handles pagination, but this is a much cleaner approach.

I am aiming to go with a version 1 works, version 2 works better approach, so I suspect whatever I include in v1 will change a lot by the time we are calling this ready to go.

I should be able to get something up next week with it being CNY out here so I have plenty of time off to focus on this

xorima avatar Jan 19 '23 06:01 xorima

@xorima hi, just checking to see if you have been able to make any progress on the pagination helpers. i would be interested in helping.

danielhochman avatar Feb 27 '23 17:02 danielhochman

So I know I've been poor at responding on this. So far I'm testing the latest iteration of paginatiors I've made which looks like the below.

Still need more tests on it before I'm happy to contribute it into the codebase but it's here for comments on design and approach.

package pagination

import (
	"context"
	"github.com/google/go-github/v53/github"
)

// ListFunc is an interfact that returns a list of items for the given type
// for example it could be:
//
//	type listFunc struct {
//		client *github.Client
//	}
//
//	func (l *listFunc) List(ctx context.Context, opt *github.ListOptions) ([]*github.Repository, *github.Response, error) {
//		t, r, err := l.client.Apps.ListRepos(ctx, opt)
//		return t.Repositories, r, err
//	}
type ListFunc[T any] interface {
	List(ctx context.Context, opt *github.ListOptions) ([]T, *github.Response, error)
}

// ProcessFunc is a function that processes an item for the given type
// this is optional as the user may not want to process the items so
// they can input a skip function that does nothing
// example:
//
//	type processFunc struct {
//		client *github.Client
//	}
//
//	func (p *processFunc) Process(ctx context.Context, item *github.Repository) error {
//		fmt.Println(item.GetName())
//		return nil
//	}
type ProcessFunc[T any] interface {
	Process(ctx context.Context, item T) error
}

// RateLimitFunc is a function that handles rate limiting
// it returns a bool to indicate if the pagination should continue
// and an error if the users wishes to return more information/errors
// example:
//
//	type rateLimitFunc struct {
//	}
//
//	func (r *rateLimitFunc) RateLimit(ctx context.Context, resp *github.Response) (bool, error) {
//		return true, nil
//	}
type RateLimitFunc interface {
	RateLimit(ctx context.Context, resp *github.Response) (bool, error)
}

func Paginator[T any](ctx context.Context, listFunc ListFunc[T], processFunc ProcessFunc[T], rateLimitFunc RateLimitFunc) ([]T, error) {

	opts := &github.ListOptions{PerPage: 100, Page: 1}

	var allItems []T

	for {
		items, resp, err := listFunc.List(ctx, opts)
		if err != nil {
			return allItems, err
		}
		allItems = append(allItems, items...)
		for _, item := range items {
			if err = processFunc.Process(ctx, item); err != nil {
				return allItems, err
			}
		}

		// Handle rate limits
		shouldContinue, err := rateLimitFunc.RateLimit(ctx, resp)
		if err != nil {
			return allItems, err
		}
		if !shouldContinue {
			break
		}

		if resp.NextPage == 0 {
			break
		}
		opts.Page = resp.NextPage
	}
	return allItems, nil
}

xorima avatar Jul 29 '23 07:07 xorima

Correct me if I'm wrong, but this could easily reside in an external repo, right? I'm thinking that since this would be the first introduction of generics (and therefore require Go 1.18+), maybe it would make more sense to move this to an external helper/support repo. We did the same thing for #1150.

gmlewis avatar Aug 01 '23 04:08 gmlewis

that's what I'm thinking, making it an external repo people can use and then a conversation about joining it into the main repo in future when go 1.18 is the base requirements for other reasons

xorima avatar Aug 03 '23 14:08 xorima

Repository is up now: https://github.com/xorima/go-api-pagination

note: the repo itself may be moved in a few weeks while I go through some work processes to have it as part of the main foss offerings from $dayjob

xorima avatar Aug 19 '23 04:08 xorima

Hey, looks great, @xorima !

I'll leave this issue open for you to create a PR that will update the README.md file in this repo with a section that points to your repo with a short blurb about why/how/when you would use it. Your PR description can say Fixes: #2618.

gmlewis avatar Aug 19 '23 22:08 gmlewis

hey, I know it's been a while and the pagination wrapper-function probably work fine for many users, but I just figured that I could solve this in an elegant maaner with a round-tripper. I hope I'll find the time to get down to writing it soon, but am leaving the note here anyway so that anyone can ping me ([email protected]) if you find it interesting / wanna suggest requirements/features for that.

gofri avatar Jan 10 '24 12:01 gofri

I had a few hours today and I made some progress. Mainly, I solved merging the response json arrays/dictionaries in a smooth manner, so I only have to add the pagination headers stuff and glue it together to get a basic round-tripper going.

In that context, I have a sort of a legal question. I'd love to use the code from populatePageValues() with slight modifications, because it's already proven to work. I wanted to copy and adjust it (and leave a "shamelessly stolen from" note), but I'm not sure what the exact BSD license instructions/limitations are. Any chance that you can advise on that?

gofri avatar Jan 26 '24 14:01 gofri

NOTE: I am NOT a lawyer and I do NOT speak for Google, so I cannot officially answer this question.

However, in my humble opinion (that you can take with a grain of salt), it seems totally reasonable to me (with my limited understanding of the BSD license) to copy the function with a GitHub permalink comment added that says where you copied it from and also the exact phrase Copyright (c) 2013 The go-github AUTHORS. All rights reserved. along with a full copy of the Google go-github BSD license appended to your own LICENSE file in your repo.

So, for example, it might look like this:

// populatePageValues parses the HTTP Link response headers and populates the
// various pagination link values in the Response.
//
// Copied from: https://github.com/google/go-github/blob/20fc90190626bc9ff64da10ffdbb3be9f1ba85be/github/github.go#L674-L741
// Copyright (c) 2013 The go-github AUTHORS. All rights reserved.
func (r *Response) populatePageValues() {
...
}

...and then concatenate this repo's LICENSE file to the end of your own repo's LICENSE file ... thereby your LICENSE file would actually contain two complete license declarations.

Does that make sense?

gmlewis avatar Jan 26 '24 15:01 gmlewis

Reading my reply, maybe // Adapted from: ... might be better since you may have to change the function signature like the receiver or maybe change some of the internals to refer to structs properly, of course. In other words, you may have to edit it, but I would think "Adapted from" might be more appriopriate than "Copied from", but your call... I think either is probably fine as long as you have the copyright message in there.

gmlewis avatar Jan 26 '24 15:01 gmlewis

@gmlewis makes total sense, that's exactly what I was thinking would be appropriate. Thanks!

gofri avatar Jan 26 '24 17:01 gofri

So I have a first version going and it seems to work well: https://github.com/gofri/go-github-pagination Please do check it out! I'm gonna add some improvements soon (see the known-limitation section in the readme).

p.s. @gmlewis, it turned out that I couldn't use the discussed code after all. The implementation here sort of relies on the user to pick the right field from the response and place it in the relevant field for the specific request. For my implementation I needed it to be solved automatically, so I had to be more careful about it. I'd be glad to contribute this logic to this repo, but I'm afraid that it's gonna be hard to do it without breaking compatability. Anyway, let me know if you think there's anything I can do to improve that part here.

p.s.2. now that I'm deep in the ocean of pagination, I think I can add some useful explanation to the readme (e.g., for the "since" param). do you think it's worth a PR?

p.s.3. I'd be glad to add a reference to go-github-pagination to the README here, once I finish the extra bits I want the round tripper to have. let me know if that sounds ok.

gofri avatar Jan 31 '24 20:01 gofri

Hey @gofri - looks great! (I might try to reduce the stutter by rearranging the directories a bit and put unit tests within the same dir as the code, but those are really nit issues, so take that with a grain of salt.)

p.s. ...

No problem.

p.s.2. ...

Sure, improvements to our documentation are always welcome. If you are talking about your repo's README, then it probably wouldn't hurt.

p.s.3. ...

Sure, improvements to our documentation are always welcome.

Thank you, @gofri !

gmlewis avatar Jan 31 '24 20:01 gmlewis

@gmlewis thanks ! :) thanks for the comment about the tests too! tbh I always go this extra mile with the unit tests to make sure that they only use the public API, instead of testing implementation details.

Sure, improvements to our documentation is always welcome. If you are talking about your repo's README, then it probably wouldn't hurt.

Actually I was referring to the README of this repo. Specifically, there's the since=NextPageToken thing (for list organizations) which I found most confusing as a user.

gofri avatar Jan 31 '24 20:01 gofri

tbh I always go this extra mile with the unit tests to make sure that they only use the public API, instead of testing implementation details.

You know that you can keep it in the same directory by changing the package from "package mypackage" to "package mypackage_test", right?

gmlewis avatar Jan 31 '24 21:01 gmlewis

@gmlewis I do now 😅 (and feel real stupid lol, I didn't know about the {pkg_name}_test exception for package name collisions)

gofri avatar Jan 31 '24 21:01 gofri

No worries! I learn new stuff every day, and hopefully will continue that trend. 😄

gmlewis avatar Jan 31 '24 22:01 gmlewis

Hey all (@jnewland , @xorima , @danriedl) I just published v1.0.0 for go-github-pagination. It now supports:

  • Sync pagination (seamless).
  • Async pagination using a helper function (designed for go-github's users).
  • General and per-request configurations.

I attached a sample for using the async wrapper with go-github below. @gmlewis I'll soon open a PR in this repo to reference it as we disucssed before :)

  handler := func(resp *http.Response, repos []*github.Repository) error {
    fmt.Printf("found repos: %+v\n", repos)
    return nil
  }
  ctx := githubpagination.WithOverrideConfig(context.Background(),
    githubpagination.WithMaxNumOfPages(3), // e.g, limit number of pages for this request
  )
  async := githubpagination.NewAsync(handler)
  err := async.Paginate(client.Repositories.ListByUser, ctx, "gofri", nil)
  if err != nil {
    panic(err)
  }

gofri avatar Mar 05 '24 17:03 gofri

Fantastic @danielhochman

I've also finally moved my variant to the final location: https://github.com/hsbc/go-api-pagination

@gmlewis are you happy for an update to the readme pointing to the two different pagination handler options out there for this repository?

xorima avatar Mar 13 '24 12:03 xorima

@gmlewis are you happy for an update to the readme pointing to the two different pagination handler options out there for this repository?

Yes, absolutely, I think that makes sense.

gmlewis avatar Mar 13 '24 13:03 gmlewis