nbviewer
nbviewer copied to clipboard
404 for new Github repos with `main` branch instead of `master` if only repo URL is provided
Putting, e.g., https://github.com/willirath/example_repo_with_main_branch into https://nbviewer.jupyter.org landing page leads to https://nbviewer.jupyter.org/github/willirath/example_repo_with_main_branch/tree/master/ which reports 404 as the correct URL would be https://nbviewer.jupyter.org/github/willirath/example_repo_with_main_branch/tree/main/.
Seeing this issue as well
Looks like the github api supports a way to query the "default_branch" See: https://docs.github.com/en/free-pro-team@latest/rest/reference/repos#get-a-repository
this shows up in a couple of different places in the nbviewer code so it might be worth discussing what the right approach is. cc: @krinsman
To clarify, is main the only branch of the repo? And the repos have no master branch? @shreddd @willirath
Are there standard naming conventions for principal branches of repositories which don't have master branches?
I'm sure it would be possible to put in some re-direct logic manually:
https://github.com/jupyter/nbviewer/blob/master/nbviewer/providers/github/handlers.py#L144
https://github.com/jupyter/nbviewer/blob/master/nbviewer/providers/github/handlers.py#L521
(GitHub enterprise) https://github.com/jupyter/nbviewer/blob/master/nbviewer/providers/github/handlers.py#L542
i.e. some try/except clause redirecting to main instead of master if master doesn't exist.
My concern is that this would complicate the code and/or make it difficult to maintain if there are no standardized conventions for the name of the principal branch of repository which does not have a master branch.
E.g. Is this for Mercurial repositories using hg-git or something like that? https://github.com/stephenmcd/hg-github
So I don't think you want to assume the name at all. Ideally it should look up the "default_branch" field via the github API and use that to generate the redirect.
eg.
curl -H "Accept: application/vnd.github.v3+json" https://api.github.com/repos/octocat/hello-world
{
"id": 1296269,
"node_id": "MDEwOlJlcG9zaXRvcnkxMjk2MjY5",
"name": "Hello-World",
"full_name": "octocat/Hello-World",
...
"default_branch": "master",
...
}
This way you don't have to make assumption about the default branch name. In other words, before each of these calls, just check the API for the right default_branch name and plug that in.
Oh OK yeah that makes sense, I didn't know that was a feature/possibility. I agree with you, that definitely sounds like the way to do it.
This might use up some additional GitHub API requests (of the 60 per hour available to those using NBViewer without GitHub authentication set up), but I would guess that's probably acceptable for anyone who wants to view a repo like this.
https://github.com/jupyter/nbviewer/blob/master/nbviewer/providers/github/client.py#L106
^ Using the github_api_request method of the AsyncGitHubClient class, and the API call you gave above @shreddd , I imagine it shouldn't be too difficult (admittedly famous last words) to create a new method for the AsyncGitHubClient class (get_default_branch or something like that). E.g. see any of the get_foo methods of that class for a template/example
https://github.com/jupyter/nbviewer/blob/master/nbviewer/providers/github/client.py#L120
And then making a PR which uses this new AsyncGitHubClient method, and a try/except clause or something similar for handling a 404 error,
e.g. maybe except web.HTTPError(404):, compare https://github.com/jupyter/nbviewer/blob/master/nbviewer/providers/github/handlers.py#L409
and then modify the GitHub provider to use this new method of the GitHub client to get the default branch when necessary? or maybe even to always get the default branch (even though it's usually master), to have a guarantee of always avoiding this kind of 404 error? (e.g. maybe modify GitHubRepoHandler to always call get_default_branch and then define the redirect URL using that? https://github.com/jupyter/nbviewer/blob/master/nbviewer/providers/github/handlers.py#L144)
e.g. compare these lines of code https://github.com/jupyter/nbviewer/blob/master/nbviewer/providers/github/handlers.py#L122 https://github.com/jupyter/nbviewer/blob/master/nbviewer/providers/github/handlers.py#L204 https://github.com/jupyter/nbviewer/blob/master/nbviewer/providers/github/handlers.py#L344
I probably don't have time to implement such a PR myself, especially since I forgot how to set up a testing setup for modified versions of the NBViewer source code. Although maybe it wouldn't be too hard for me to try to follow the directions I wrote for myself here? https://github.com/krinsman/examples/tree/master/docker_examples/jupyterLabhubImages/nbviewer I'm not sure.
In any case I hope the above information is enough to make writing a PR for someone else to be much easier/quicker? (Compared to if they had to find the most likely relevant sections of code themselves?)
If you have any questions let me know, and I can review any submitted PR and maybe also merge it if we can verify it doesn't break anything (which it probably wouldn't).
OK - see PR #968 for an initial fix. This is simpler than what you suggested but it seems to work for me.
I also couldn't figure out whether uri_rewrites ever gets called for the user/repo case - I don't think it is, in which case you might drop that rule.
@krinsman - let me know if there are updates here
I seem to be having the same problem on my repo where main is the main branch. Was there any progress ?
Seems this was fixed https://github.com/jupyter/nbviewer/commit/299359739ba0a2a6aaaeedeb7ac4079511c8613b And https://nbviewer.org is redirecting correctly with the OP's repo. This issue can probably be closed?