incubator-devlake icon indicating copy to clipboard operation
incubator-devlake copied to clipboard

feat(gitlab): implemented keyset pagination for gitlab #8529

Open moulivashisth opened this issue 6 months ago • 4 comments

⚠️ Pre Checklist

Please complete ALL items in this checklist, and remove before submitting

  • [x] I have read through the Contributing Documentation.
  • [x] I have added relevant tests.
  • [x] I have added relevant documentation.
  • [x] I will add labels to the PR, such as pr-type/bug-fix, pr-type/feature-development, etc.

Summary

Fix GitLab Users collection hitting offset pagination limits by adding keyset pagination.

This PR updates the GitLab CollectAccounts subtask to avoid max offset errors when collecting large user sets:

  • Self-managed GitLab instances now use keyset pagination on /api/v4/users
    (pagination=keyset&order_by=id&sort=asc&per_page=N&id_after=<last_id>) and do not send page.
  • gitlab.com / jihulab.com keep existing behavior on project member endpoints
    (/projects/:id/members[/all]) which typically remain under offset caps per project.
  • Retains existing API-version fallback (/members/all vs /members/ for < v13.11).
  • Response parser now tracks the last item’s id to advance the keyset cursor safely.
  • No breaking changes to task wiring or raw table schema (gitlab_api_users).

Why: Some instances enforce strict offset caps (e.g., 50k), causing offset pagination is restricted errors when fetching Users. Keyset pagination removes the offset and enables full retrieval.

Risk/Compatibility:

  • Backward compatible; only changes query parameters and cursor handling.
  • If a project’s members list exceeds offset caps and the endpoint lacks keyset, users should collect site users (self-managed path) or shard by project/group—unchanged from current guidance.

Does this close any open issues?

Closes 8529 ([Bug][GitLab] Pagination not working Again)

Screenshots

N/A

Other Information

moulivashisth avatar Sep 12 '25 11:09 moulivashisth

Is keyset and api/v4 available in Community Edition 11+ ?

klesh avatar Sep 17 '25 08:09 klesh

Thanks for bringing this up! • Keyset pagination for /api/v4/users was introduced in 16.5, and 17.0+ requires it for large responses.

I’ve updated the code to gate keyset by server version (≥16.5) and fallback to offset otherwise. Project members endpoints keep their existing offset behavior.

This preserves compatibility with CE 11–16.4 while avoiding offset-cap failures on newer instances.

moulivashisth avatar Sep 18 '25 05:09 moulivashisth

Thanks for the review.

I hit a typecheck error: plugins/gitlab/tasks/account_collector.go:69:55: undefined: apiVersion

Fix: Declared and reused a single apiVersion variable sourced from the client.

moulivashisth avatar Sep 22 '25 05:09 moulivashisth

Hi, you may run the following command to fix the golangci-lint error:

gofmt -s -w -l plugins/gitlab/tasks/account_collector.go

klesh avatar Sep 24 '25 04:09 klesh