soba
soba copied to clipboard
Ability to specify the GitLab instance URL
Can we define the GitLab instance URL? Would be really useful for making backups of private GitLab repos.
It's currently hard-coded as https://gitlab.com/api/v4.
I don't think it'd take much to introduce an API URL override. Will take a look.
Awesome, thanks so much in advance! 😀
I'm not able to test myself as I don't have my own gitlab setup, but I've just introduced an undocumented option to specify an API URL override for each provider:
- GITHUB_APIURL
- GITLAB_APIURL
- BITBUCKET_APIURL
The value needs to include the https:// prefix.
All I've tested is that no existing functionality is impacted, so please give it a go and let me know if it works.
https://github.com/jonhadfield/soba/releases/tag/1.1.3-beta
Awesome, it connected flawlessly - thanks for the quick response!
Soba has backed up 26 repos of the 128, which at first glance is a bit weird as one page (of the seven total) has 20 repositories. The repos that soba had backed up are seemingly from across multiple pages - any idea on how to get the remaining repos to show up?
Thanks for the feedback. I'm fairly certain this is linked to an existing issue I've not around to: https://github.com/jonhadfield/soba/issues/10. I'll try and take a look tomorrow.
I've just put out a new release: https://github.com/jonhadfield/soba/releases/tag/1.1.3-beta.1
This adds pagination for group projects, which is where I suspect the issue is.
I may not be bringing back all groups though. How many do you have defined?
Thanks for this - I have 12 groups in total within this GitLab instance. Nothing conceivable has changed with the amount of backed up repositories, sadly.
I did make two observations though:
- Only those Groups get backed up where I am the
Ownerof the Group - Subgroups are skipped
Ah, subgroups aren't aren't automatically returned when requesting all groups. It's a separate API call. That shouldn't take long to add though.
For 'Owner only Groups', that was kind of intentional. As I don't use GitLab myself, I was concerned it could return ones that were loosely related, but not ones you cared about. For example, in GitHub, a user in your Organisation could own many repos in their personal GitHub setup, but you wouldn't want to back those up. Anyhow, I'll see what's involved in adding it.
That makes total sense, thank you!
Sorry for the delay.
I've discovered there are various ways to retrieve Groups, and my above comment on sub-Groups was incorrect. In 1.1.3-beta.1 I was attempting to retrieve 'all available' and 'all owned by user', but the result was that the latter overrode the former.
I've settled on a new way of retrieving Groups: retrieving based on the user's minimum access level to the Group. Along with group pagination being added to 1.1.3-beta.2, this should return all Groups and sub-Groups you have at least Guest access to. More details on the release page, including a way to override the minimum access level.
Please let me know how you get on.
No worries, thanks for getting back to me on this!
I've pulled beta.2, it is somewhat better: now it pulls 26 projects out of the 130 I have access to. The ones that get pulled have me as their owner - anything below that level does not get pulled. I've also tried setting GITLAB_GROUP_ACCESS_LEVEL_FILTER manually to 10 within the env vars, same result. Then elevated it to 30, still no changes - any suggestions on what should I try?
It turns out retrieving Groups by minimum access level doesn't work across the whole of GitLab, but only local ones. The alternative is to retrieve Projects by minimum access level, so I've switched the behaviour in my code and proven this works as I can now clone repositories from another user where I have a Project access level of at least Reporter (I spent a lot of time wondering why Guest wasn't sufficient, until I RTFM).
The previous env var is now replaced with: GITLAB_PROJECT_MIN_ACCESS_LEVEL as the filter, where the value is an integer as follows:
20: "Reporter"
30: "Developer"
40: "Maintainer"
50: "Owner"
If unset, the default is 20.
As always, please shout if it works or doesn't.
Thanks!
I've tried without the env var and then with it, set to 20, 30 and finally 40 - all with the same result (the access levels were reported correctly across each run):
soba: 2022/10/02 11:43:30 gitlab.go:135: GitLab project minimum access level set to Reporter (20)
soba: 2022/10/02 11:43:30 gitlab.go:177: json: cannot unmarshal object into Go value of type githosts.gitLabGetProjectsResponse
It seems the response you get from the API doesn't match the bit of code I use to store it. It's difficult to debug without knowing the response, so I've just pushed a new release that outputs the GitLab API response if you have an environment variable set as: SOBA_LOG=trace
Would you mind trying that and sending the output? It's the structure, more than the content, so anonymising any repo names, urls, etc. is fine. If easier, just email me at [email protected].
If it helps, the structure I'm expecting is a json list of records:
path
path_with_namespace
http_url_to_repo
ssh_url_to_repo
id
name
created_at
Not sure all those fields are still useful in order to clone, so will take a note to review.
This is an example response (with only relevant fields kept) triggered by a test, where you see the response is json, starts with a [ to open the array/list, and is followed by a number of records (GitLab Projects) surrounded by { and }. After the final one there should be a closing ].
[
{
"id": 39877738,
"name": "bourbon",
"path": "bourbon",
"path_with_namespace": "biscuits2/bourbon",
"created_at": "2022-10-01T20:28:50.042Z",
"ssh_url_to_repo": "[email protected]:biscuits2/bourbon.git",
"http_url_to_repo": "https://gitlab.com/biscuits2/bourbon.git",
...
},
...
]
I'm expecting your response may be malformed in some way, or I'm triggering the an API call that's invalid.
Thanks, the trace option immediately let me know that I've used the wrong access token - I've also switched that between one of the runs to make sure I'm starting fresh and sadly copied over the wrong one, so apologies for that!
Anyhow now a bit more projects get loaded: 35 are present out of the total 130 I have access to.
It seems like projects where I'm Owner are the only the ones present in the reponse JSON - aside from a special case where a small fraction of a Group (where i'm Maintainer) is also present. From this Group, 4 projects are present out of the 37 (these 4 are not the first four in the Group's web frontend listing).
Let me know if I can try anything else, your quick responses are much appreciated! 👍
I've used the wrong access token Ah, that's a use-case I've not checked for. Will add it to the list.
Is it possible your token doesn't have the necessary access to the other projects?
The API call I make is simply specifying a page size, i.e. number of results to return with each call (20 by default), and the minimum access level mentioned above. In theory, that should return everything, regardless of ownership.
https://docs.gitlab.com/ee/api/projects.html#list-all-projects
Please could you check the response you get from Gitlab (enabled with SOBA_LOG=trace) to see if a missing project (one you expected to be retrieved) is in the json output? I'm trying to work out if GitLab's API is providing the detail and I'm not acting upon it properly, or if GitLab is simply not returning them.
I've issued this request using CocoaRestClient based on the docs you've linked:
https://<gitlab_url>/api/v4/projects?private_token=<token>&per_page=150
And it returned a JSON that is 12,313 lines long - it seems to contain most of the projects I have access to, although right off the bat, the second project that is visible on the web frontend is not present in this JSON response, while the very first is (I'm Developer in both projects).
The Access Token has all the tickboxes enabled:
apiread_apiread_userread_repositorywrite_repositoryread_registrywrite_registry
Is there any other URL param I should add to the request? Or should I try a different endpoint?
I worked out the pagination I added when retrieving projects via the groups endpoint was missing when I switched to making requests to the projects endpoint.
I've just pushed a new release that should now work: https://github.com/jonhadfield/soba/releases/tag/1.1.3-beta.5.
Hats off to you, it now works as expected and has bundled up all the repos I have access to - awesome work! 🍻 This is now resolved with beta.5, so it can be closed now.
On a somehat related note: I was wondering whether it is somehow possible to fetch the latest commit from the previous bundle to speed up the consequent soba runs (this way the git clone and bundle steps might be spared) - but as far as I can tell, the only way would be to restore the bundle back to a repo and then compare the latest commits...?
Great news. Thanks for the feedback.
I'll copy your request into another issue so I can close this one. Will respond on there.