github-api Commit bigger file failed

Error when I upload a 60M file. This is my code.

GHRepository repo = github.getRepository(user+"/"+GithubUploader.repo);
			GHRef masterRef = repo.getRef("heads/master");
	        String masterTreeSha = repo.getTreeRecursive("master", 1).getSha();
	        GHTreeBuilder treeBuilder = repo.createTree().baseTree(masterTreeSha);
			treeBuilder.add(uploadPath+"/"+file.getName(), FileUtil.readBytes(file), true);
			String treeSha = treeBuilder.create().getSha();
	        String commitSha = repo.createCommit()
	        	   .message(username + " updates")
	               .tree(treeSha)
	               .parent(masterRef.getObject().getSha())
	               .create()
	               .getSHA1();
	        masterRef.updateTo(commitSha);

And the error:

java.lang.OutOfMemoryError: Java heap space
	at java.util.Base64$Encoder.encode(Base64.java:262) ~[na:1.8.0_262]
	at java.util.Base64$Encoder.encodeToString(Base64.java:315) ~[na:1.8.0_262]
	at org.kohsuke.github.GHBlobBuilder.binaryContent(GHBlobBuilder.java:39) ~[github-api-1.115.jar!/:na]
	at org.kohsuke.github.GHTreeBuilder.add(GHTreeBuilder.java:136) ~[github-api-1.115.jar!/:na]

Refer to this link https://github.com/hub4j/github-api/issues/878#issuecomment-655047530

Jul 27 '20 07:07 laboratorys

Error when I upload a 60M file.

Sounds like basic java memory limitations. Try adding -Xmx1g and/or -Xms1g to your java options. If that doesn't work use 2g instead of 1g. That should mitigate this issue.

Jul 27 '20 18:07 bitwiseman

The problem here is that the GitHub Blob API takes JSON with content field containing the contents of the new blob as Base64 encoded characters. For simplicity, we currently eagerly convert bytes to a Base64 string, add that string to a JSON object (which we then also eagerly convert to a string), and then send that JSON string.

In this case that results in us allocating a 60M byte[], then allocating a 80M Base64 String, then allocating a slightly-more than 80M String of the final JSON to be sent. These are all immediately discarded, but you need to have enough space to hold them.

On solution would be to add a new GHTreeBuilder.add(InputStream) method (along with GHBlobBuilder.binaryContent(InputStream)). This would make it so we at least don't do the first 60M allocation, we could read directly to Base64 string. The next step might be to add a GitHubRequest.Builder.with(String name, InputStream value). We could might then avoid another 80M allocation. Finally, GHBlobBuilder could be changed to use use the GitHubRequest.Builder.with(InputStream) method and stream the the entire JSON object directly instead of building it before hand.

All of these are good ideas that would reduce memory allocation, but they would also take time to implement and add complexity to the code. That last step would be particularly involved. However, maybe this could be combined with #903 - give the consumers of GitHubRequest.Builder more of the responsibility for deciding how parameters and data are sent.

Still, this won't happen any time soon unless someone else want to spend the time on it. Until then the workaround above is the way to go.

Jul 27 '20 19:07 bitwiseman

Thanks.

Jul 31 '20 13:07 laboratorys

@onedrive-x Does the workaround work for you?

Jul 31 '20 22:07 bitwiseman