Add symlink support
Describe the bug
if a symlink exists in the repository it does not seem possible to resolve it using the current API methods
To Reproduce Steps to reproduce the behavior:
- create a repo
- create a directory
dir - create a file
fileindirwith contentshello world - create a symlink
dir-linkpointing todir - create a symlink
file-linkpointing todir/file - use the api to try and resolve
dir-linkorfile-link
Expected behavior
There is an API that can be used to resolve the symbolic links to the actual files
Desktop (please complete the following information):
- OS: N/A
- Browser N/A
- Version N/A
Additional context
https://issues.jenkins-ci.org/browse/JENKINS-62922
target is the underlying location...
https://developer.github.com/v3/repos/contents/#response-if-content-is-a-symlink
note: if the symlink is to a file and in the same directory then getContent works. if the path contains a symlinked directory then it does not.
The problem here is that the path ghcontent-ro/a-symlink-to-a-dir/entry-one does not actually exist in the repo. It can be reached by traversing the symlink, but as far as git and github are concerned, the path is not there.
The behavior of the API for getting repository contents is kind of all over the place:
- If the path you request points to a file, you get that file's record.
- If the path you request points to a directory, you get an ARRAY of file records for the files in that directory. It would be much better to get the directory's record with a "children" field containing an array
- If the path you request points to a symlink AND the target is a file, you get the target file's record.
- If the path you request points to a symlink AND the target is not a file, you get the symlink's record.
The only way to use directory symlinks would be to take the path and traverse it one element at a time looking for symlinks and changing the path to request the targeted path. That would result in one request per path element which is painfully costly.
Thinking about the least costly way to do this:
- Request a path. IF success, return.
- If 404:
a. If parent directory is
\, return 404. b. Request the parent directory. c. If parent directory is symlink, replace with target and goto 1. c. If parent directory is directory, return 404 c. If 404, set parent directory to parent of current parent, goto 2.a.
This could be done in a bisecting fashion to keep the number of requests down. Even so, the cost for any 404 would go from one request to log(n) requests - there's no way to tell if the 404 is real or caused by a symlink. That would mean every file content 404 would suddenly start causing multiple requests.
We could make it an optional behavior, maybe a new API method.
@jtnord
Hm, doing a bit more searching, I see there's a "trees API". It doesn't traverse directory symlinks either, but it could be used to find symlinks with fewer requests. It has a recursive option that would let us quickly get a flat list of the directory tree.
https://api.github.com/repos/hub4j-test-org/GHContentIntegrationTest/git/trees/cc7e26f850339a8e8427fa2d983ca6006ad1a78c?recursive=1
That query can return a large number of records which might make it slow. It may truncate if there are too many. Truncation could be handled by querying again inside a subtree. Looks like symlinks are blobs just like files, but they have a different mode.
This would reduce the added cost for general 404's to a much smaller number of querie, probably only 1 or 2 in most cases. That wouldn't be so bad. Traversing to symlinks would be the same cost - an initial 404 followed by a remapping. The tree could be cached on the GHRepository instance maybe?
Still probably not something we'd want turned on by default, but I'm open to discussion.
If the path you request points to a symlink AND the target is a file, you get the target file's record.
iff the target file exists within the bounds of the repo, otherwise the symlink record :)
I concur that doing this by default in GHRepository.getFileContent(String) is probably not the best use of API token calls due to rate limiting. but I think maybe another function that callers can use GHRepository.getFileContent(String path, boolean traverseSymLinks) could be useful (and make the former call the latter with a default false maybe set by a global static/system property)?
@jtnord That sounds reasonable.
Also, instead of caching at the object level, we could depend on okhttp caching to reduce rate limit usage while also accurately updating if the tree updates.
See #878 for some related discussion around GHTree interactions.
Sorry for the bother but perhaps you can help me out here. Do you guys know if symlinks can be made to work on GitHub?
That is, could they forward resource requests to raw.githubusercontent.com to the symlink target? I.e. if I have a repo with a folder called images as well as a symlink called logos pointing to images and some images with unchangeable src:
<img src="https://raw.githubusercontent.com/<user>/<repo>/main/logos/<some-file>.jpg" />
Currently the URL returns "404 not found". Is it possible to return the symlinked file?
@janosh Based on the discussion above, yes, it can be done, but it would need to be separate method/option because the behavior requires multiple API calls.
PRs welcome. I'd be happy to answer any questions you have for how to implement the solution suggested above.
Sorry, I'm not a Java dev. I contacted GitHub support about the possibility of making this symlink forwarding native functionality. Will report back if anything comes of that.