gitea
gitea copied to clipboard
Extremely slow performance with larger repositories
Description
My department has a repository about 500M in size (bare) and with around 40000 commits in flat history (we don't use branches in that specific repository). It contains about 87000 files. Viewing is generally okay, but an attempt to create a pull request results in 2 minute wait between the button press and new pull request being loaded. In the background I see git rev-parse
being run a lot.
I tried to reproduce the issue with the Linux Kernel repository (and I get it, THAT one is massive, with more than 4G of data and millions of commits). It took me a few minutes to wait for the repository page to even load, let alone create any pull requests.
I'm not sure it can be fixed, but maybe it is possible to load the pull request page after the press of a button and let the pull request to be created in background at least? It's especially annoying since another press of a button will create another identical pull request and will take as long as the first one.
Gitea Version
1.16.5
Can you reproduce the bug on the Gitea demo site?
No
Log Gist
No response
Screenshots
No response
Git Version
No response
Operating System
No response
How are you running Gitea?
From gitea.io
Database
SQLite
For big repository, viewing repository maybe slow at first time, but it will be faster second time if cache enabled.
For big pull request, how many files changed in that PR?
We often commit less than 10 files, the issue persists regardless of the amount of files involved.
What Operating System are you using?
If you're using Windows try the gogit variant.
What version of git are you using?
Upgrade to the latest git. If you're using old versions of git they're slow.
How did you add this repository?
Was it a mirror or a push? - this matters...
Have made sure that the commit-graphs are written?
Go into gitea's repository on the disk and type git commit-graph write
What are the logs showing?
This report is completely inadequate for helping us to help you.
FYI, there is a linux repo on gitea.com .
And:
$ time curl https://gitea.com/marktsai0316/linux/commits/branch/master/scripts/clang-tools
curl 0.02s user 0.02s system 0% cpu 54.956 total
(update: above problem has been optimized)
Maybe it's possible to reproduce the problem there (still 504)
https://gitea.com/wxiaoguang/linux/compare/v5.9-rc8...wxiaoguang/linux:wxiaoguang-patch-2
I have a feeling that that repository does not have a commit-graph. I don't have access to the repository on the filesystem to check though.
I think the two urls actually represent different problems.
Let's look at the first one:
- https://gitea.com/marktsai0316/linux/commits/branch/master/scripts/clang-tools
This is essentially:
git rev-list --count $REVISION -- $FILE_PATH
followed-by:
git log $REVISION --follow --pretty=format:%H -- $FILE_PATH
Now both of these would be sped up by commit-graphs. The first call could actually be done entirely within the commit-graph too.
But the interesting thing is that the second one of these is so much slower than the first.
In fact they can actually produce different results due to the --follow
on the second call (which appears to be the cause of most of the slow downs.)
Now if it were not for --follow
we could actually use git rev-list
for both of these calls and the skip
and max-count
will be free (in contrast to the current system where the skip doesn't work.)
Looking at the history for this line I don't think there was reasoning behind adding the follow except that I would guess that it was nice to add.
So... a simple speed improvement here is to drop the follow and switch to rev-list
for these calls.
An additional speed improvement is to add a context timeout like in the git log --name-only
I have left the company where this installation was deployed, so I can't provide more context at this point. I've let my former colleagues know about this issue, hopefully the will join the discussion and provide more info or test the suggested solutions.
I'm not really sure why I haven't provided more info on the matter in terms of installation in question, so here it is.
OS Version
Centos 7 (reproduced on ArchLinux)
Git Version
something old, along the lines of 1.8 (reproduced on 2.35 iirc)
Push or mirror?
The repository was initialized with git --bare push --mirror
(same when I tried to reproduce the issue on my laptop). (I might get that command wrong, but I was mirroring a bare repo to the remote installation of gitea.)
This is all the info I can provide since I no longer have any access to the servers in question.
OK @vdrandom upgrade git to the latest version and go in to gitea's copy of the repo on the disk and run git commit-graph write
. (If you're on dev/1.18 you can just run gitea doctor --run check-commit-graphs --fix
.)
git 1.8 is really old and will always be slow on big repos.
Big repos also need commit-graphs written. They should be being written now but old repos will not have them written.
It would be helpful to know exactly which urls are slow after you've done that. (But you must ensure you're running a new version of git.)
If the slow urls are related to diffs you should adjust:
[git]
MAX_GIT_DIFF_LINES = 100 ; 1000: Max number of lines allowed of a single file in diff view.
MAX_GIT_DIFF_LINE_CHARACTERS = ; 5000: Max character count per line highlighted in diff view.
MAX_GIT_DIFF_FILES = 20 ; 100: Max number of files shown in diff view.
but tell us exactly which urls are slow.
actually it looks like we need git commit-graph write --changed-paths
Upgrade gitea.com and now it will take 13 secs in first time loading and less 4 secs in other time when visiting https://gitea.com/marktsai0316/linux/commits/branch/master/scripts/clang-tools .
After manually executed git commit-graph write --changed-paths
in the server side, now it will take less 5 second in the first time and less 1 second in other time.
Ok so it looks like that page could still benefit from a deferable/asynchronous request so I can dust off the code I was working on for that.
I don't think it's worth caching these results as I expect that they should not be being requested often.
I guess I should look at the other URL too
We also experience slow performance. https://hub.openingdesign.com/
Hi all. I've read through the three issues I can find about slow performance on pulls/diffs. I see that the proposed solution is to shift the diffing to an async request. Are there any tips/tricks to be used on the meantime, or partial work on the async change we could take a look at please (@zeripath?)? Any guidance/hints/workarounds would be much appreciated 💙
- Ensure that commit-graphs are written
-
gitea doctor --run check-commit-graphs --fix
-
- Could you please replicate the issue on try.gitea.io and give us an example of what is slow?
- Could you tell me if changing these settings in your app.ini improve things?
[git]
MAX_GIT_DIFF_LINES = 100 ; 1000: Max number of lines allowed of a single file in diff view.
MAX_GIT_DIFF_LINE_CHARACTERS = ; 5000: Max character count per line highlighted in diff view.
MAX_GIT_DIFF_FILES = 20 ; 100: Max number of files shown in diff view.
Without logs or some concrete test case we're blind and we don't know what needs to be improved.
Thank you mate. Sorry for the misunderstanding, I assumed since the issue was still open that work was ongoing. The slowness in our case is just due to the scale of some very large PRs, and we're interested in loading the diffs async (ala Github). If there's no code (incomplete or otherwise) to poke at, please consider my reply here withdrawn, and sorry again for the misunderstanding :blue_heart:
@garymoon - there's already partial async load - we just load a certain number by default. And that amount is IMO too large.
if you change the [git]
settings I mentioned above that should improve things a lot.
If you're after an infinite scroll type thing that would be great and there's practically all of the basic API is there - we'd just have to sort out the JS.
For what its worth I have a repo with 10k commits where loading the commit graph takes mere seconds, and I have repositories with barely 1.6k commits where loading the commit graph takes over two minutes. It doesn't necessarily seem dependent on repository size alone.
Can you test the second time loading time?
Upgrade gitea.com and now it will take 13 secs in first time loading and less 4 secs in other time when visiting https://gitea.com/marktsai0316/linux/commits/branch/master/scripts/clang-tools .
After manually executed
git commit-graph write --changed-paths
in the server side, now it will take less 5 second in the first time and less 1 second in other time.
As I can see it will only 490ms on the current version. I think this can be closed in a short term if there is no further information.
We close issues that need feedback from the author if there were no new comments for a month. :tea: