octokit.rb icon indicating copy to clipboard operation
octokit.rb copied to clipboard

Suggested Pagination Traversal Strategy results in infinite loop

Open s2t2 opened this issue 8 years ago • 5 comments

The suggested pagination traversal strategy is not working. It results in an infinite loop.

@client = Octokit::Client.new(:access_token => ACCESS_TOKEN)

user_response = @client.user
username = user_response[:login]

events_response = @client.user_events(username, :per_page => 100)
page_number = 1
latest_event_created_at = events_response.last[:created_at]
puts "PAGE #{page_number} -- EVENTS SINCE #{latest_event_created_at}"

while @client.last_response.rels[:next] # while there is a next page to get ...
  events_response = @client.last_response.rels[:next].get.data # get the next page
  page_number += 1
  latest_event_created_at = events_response.last[:created_at]
  puts "PAGE #{page_number} -- EVENTS SINCE #{latest_event_created_at}"
end

=>

PAGE 1 -- EVENTS SINCE 2016-02-28 23:02:38 UTC
PAGE 2 -- EVENTS SINCE 2016-02-13 20:04:08 UTC
PAGE 3 -- EVENTS SINCE 2016-02-13 20:04:08 UTC
PAGE 4 -- EVENTS SINCE 2016-02-13 20:04:08 UTC
PAGE 5 -- EVENTS SINCE 2016-02-13 20:04:08 UTC
PAGE 6 -- EVENTS SINCE 2016-02-13 20:04:08 UTC
PAGE 7 -- EVENTS SINCE 2016-02-13 20:04:08 UTC
PAGE 8 -- EVENTS SINCE 2016-02-13 20:04:08 UTC
PAGE 9 -- EVENTS SINCE 2016-02-13 20:04:08 UTC
PAGE 10 -- EVENTS SINCE 2016-02-13 20:04:08 UTC
PAGE 11 -- EVENTS SINCE 2016-02-13 20:04:08 UTC
PAGE 12 -- EVENTS SINCE 2016-02-13 20:04:08 UTC
PAGE 13 -- EVENTS SINCE 2016-02-13 20:04:08 UTC
PAGE 14 -- EVENTS SINCE 2016-02-13 20:04:08 UTC
... etc

I think the issue is happening because the call to get the next page of data (@client.last_response.rels[:next].get.data) does not count as a request and resulting response known to the @client.last_response method. Let me know if there is something I'm missing or misunderstanding.


For googlers looking for a way to traverse the pagination, try the following:

@client = Octokit::Client.new(:access_token => ACCESS_TOKEN)

user_response = @client.user
username = user_response[:login]

page_number = 1
events_response = @client.user_events(username, :page => page_number, :per_page => 100)
latest_event_created_at = events_response.last[:created_at]
puts "PAGE #{page_number} -- EVENTS SINCE #{latest_event_created_at}"

while latest_event_created_at > 1.year.ago
  page_number += 1

  begin
    events_response = @client.user_events(username, :page => page_number, :per_page => 100)
  rescue Octokit::UnprocessableEntity => e
    exit if e.message.include?("pagination is limited for this resource")
  end

  latest_event_created_at = events_response.last[:created_at]
  puts "PAGE #{page_number} -- EVENTS SINCE #{latest_event_created_at}"
end

=>

PAGE 1 -- EVENTS SINCE 2016-02-28 23:02:38 UTC
PAGE 2 -- EVENTS SINCE 2016-02-13 20:04:08 UTC
PAGE 3 -- EVENTS SINCE 2016-01-29 03:31:03 UTC

... however, note there is still a restriction on the number of pages of events returned by the service.

s2t2 avatar Mar 16 '16 00:03 s2t2

Octokit::Connection.paginate() accepts a block, which is yields data for each page:

def paginate(url, options = {}, &block)
  (...)
  yield(data, @last_response)
  (...)
end

In order to pass a block when calling user_events, it would need to be passed explicitly to paginate():

def user_events(user, options = {})
  # original:
  # paginate "#{User.path user}/events", options
  paginate "#{User.path user}/events", options, &Proc.new
end

I guess adding &Proc.new as an argument for paginate calls would make dealing with pagination much easier:

@client.user_events(...) do |data, last_response|
  # 100 events here...
  puts last_response.data
}

Please correct me if I am wrong.

stmllr avatar Apr 21 '16 14:04 stmllr

Does this paginate method sound like a good candidate for this section? http://octokit.github.io/octokit.rb/#Pagination

I ended up looking into the internal autopaginate code to figure out how pagination was happening, because the sample code never got past page 2. It turns out that Octokit.last_response is only set if you call a lib method directly, but not when following rels.

# From docs. This will work...
issues = Octokit.issues 'rails/rails', :per_page => 100
issues.concat Octokit.last_response.rels[:next].get.data
# ...but the following will give you the second page again
issues.concat Octokit.last_response.rels[:next].get.data

# Instead you have to keep track of the last response yourself
issues = Octokit.issues 'rails/rails', :per_page => 100
last_response = Octokit.last_response

while true do
  # ... process the last_response.data ...
  break unless last_response.rels[:next]
  last_response = last_response.rels[:next].get
end

Also, if you set a particular header in the original request for pagination, e.g. Accept: application/vnd.github.v3.star+json it doesn't seem to get respected when following the relation, and has to be passed in again, but this might be a separate issue.

In the end, if the paginate method is the way to go, I'd be happy to help update the docs =)

xevix avatar Aug 05 '16 08:08 xevix

Thanks, everyone for the detailed discussions. We'll work on getting this documented, and also for the next major release we have plans for improving the pagination user interface.

kytrinyx avatar Sep 14 '17 17:09 kytrinyx

I stumbled across this while trying to work out why I couldn't get a list of all repositories even when I set per_page: 500 (is the limit of 100 documented?).

In the example above from @xevix I believe that the break needs to be after the new version of last_response is set, so it should be:

issues = Octokit.issues 'rails/rails', :per_page => 100
last_response = Octokit.last_response

while true do
  # ... process the last_response.data ...
  last_response = last_response.rels[:next].get
  break unless last_response.rels[:next]
end

I am actually using github.organization_repositories instead of Octokit.issues, where github is an instance of Octokit::Client, but I wouldn't think that this should make a difference.

As the break is now at the end, the loop can be further simplified and my version looks like:

all_repos = github.organization_repositories 'rails'
last_response = github.last_response

while last_response.rels[:next] do
  all_repos.concat last_response.rels[:next].get.data
  last_response = last_response.rels[:next].get
end

jrmhaig avatar Nov 21 '18 11:11 jrmhaig

1

rmlogistic avatar Nov 28 '18 22:11 rmlogistic

👋 Hey Friends, this issue has been automatically marked as stale because it has no recent activity. It will be closed if no further activity occurs. Please add the Status: Pinned label if you feel that this issue needs to remain open/active. Thank you for your contributions and help in keeping things tidy!

github-actions[bot] avatar Jul 27 '23 01:07 github-actions[bot]