gitea icon indicating copy to clipboard operation
gitea copied to clipboard

Extremely slow performance with larger repositories

Open vdrandom opened this issue 2 years ago • 19 comments

Description

My department has a repository about 500M in size (bare) and with around 40000 commits in flat history (we don't use branches in that specific repository). It contains about 87000 files. Viewing is generally okay, but an attempt to create a pull request results in 2 minute wait between the button press and new pull request being loaded. In the background I see git rev-parse being run a lot.

I tried to reproduce the issue with the Linux Kernel repository (and I get it, THAT one is massive, with more than 4G of data and millions of commits). It took me a few minutes to wait for the repository page to even load, let alone create any pull requests.

I'm not sure it can be fixed, but maybe it is possible to load the pull request page after the press of a button and let the pull request to be created in background at least? It's especially annoying since another press of a button will create another identical pull request and will take as long as the first one.

Gitea Version

1.16.5

Can you reproduce the bug on the Gitea demo site?

No

Log Gist

No response

Screenshots

No response

Git Version

No response

Operating System

No response

How are you running Gitea?

From gitea.io

Database

SQLite

vdrandom avatar May 26 '22 16:05 vdrandom

For big repository, viewing repository maybe slow at first time, but it will be faster second time if cache enabled.

For big pull request, how many files changed in that PR?

lunny avatar May 26 '22 17:05 lunny

We often commit less than 10 files, the issue persists regardless of the amount of files involved.

vdrandom avatar May 26 '22 18:05 vdrandom

What Operating System are you using?

If you're using Windows try the gogit variant.

What version of git are you using?

Upgrade to the latest git. If you're using old versions of git they're slow.

How did you add this repository?

Was it a mirror or a push? - this matters...

Have made sure that the commit-graphs are written?

Go into gitea's repository on the disk and type git commit-graph write

What are the logs showing?

This report is completely inadequate for helping us to help you.

zeripath avatar Aug 02 '22 07:08 zeripath

FYI, there is a linux repo on gitea.com .

And:

$ time curl https://gitea.com/marktsai0316/linux/commits/branch/master/scripts/clang-tools
curl   0.02s user 0.02s system 0% cpu 54.956 total

(update: above problem has been optimized)

Maybe it's possible to reproduce the problem there (still 504)

https://gitea.com/wxiaoguang/linux/compare/v5.9-rc8...wxiaoguang/linux:wxiaoguang-patch-2

image

wxiaoguang avatar Aug 02 '22 08:08 wxiaoguang

I have a feeling that that repository does not have a commit-graph. I don't have access to the repository on the filesystem to check though.

zeripath avatar Aug 05 '22 20:08 zeripath

I think the two urls actually represent different problems.

Let's look at the first one:

  • https://gitea.com/marktsai0316/linux/commits/branch/master/scripts/clang-tools

This is essentially:

git rev-list --count $REVISION -- $FILE_PATH

followed-by:

git log $REVISION --follow  --pretty=format:%H -- $FILE_PATH

Now both of these would be sped up by commit-graphs. The first call could actually be done entirely within the commit-graph too.

But the interesting thing is that the second one of these is so much slower than the first.

In fact they can actually produce different results due to the --follow on the second call (which appears to be the cause of most of the slow downs.)

Now if it were not for --follow we could actually use git rev-list for both of these calls and the skip and max-count will be free (in contrast to the current system where the skip doesn't work.)

Looking at the history for this line I don't think there was reasoning behind adding the follow except that I would guess that it was nice to add.

So... a simple speed improvement here is to drop the follow and switch to rev-list for these calls.

An additional speed improvement is to add a context timeout like in the git log --name-only

zeripath avatar Aug 07 '22 20:08 zeripath

I have left the company where this installation was deployed, so I can't provide more context at this point. I've let my former colleagues know about this issue, hopefully the will join the discussion and provide more info or test the suggested solutions.

I'm not really sure why I haven't provided more info on the matter in terms of installation in question, so here it is.

OS Version

Centos 7 (reproduced on ArchLinux)

Git Version

something old, along the lines of 1.8 (reproduced on 2.35 iirc)

Push or mirror?

The repository was initialized with git --bare push --mirror (same when I tried to reproduce the issue on my laptop). (I might get that command wrong, but I was mirroring a bare repo to the remote installation of gitea.)

This is all the info I can provide since I no longer have any access to the servers in question.

vdrandom avatar Aug 07 '22 21:08 vdrandom

OK @vdrandom upgrade git to the latest version and go in to gitea's copy of the repo on the disk and run git commit-graph write. (If you're on dev/1.18 you can just run gitea doctor --run check-commit-graphs --fix.)

git 1.8 is really old and will always be slow on big repos.

Big repos also need commit-graphs written. They should be being written now but old repos will not have them written.

It would be helpful to know exactly which urls are slow after you've done that. (But you must ensure you're running a new version of git.)

If the slow urls are related to diffs you should adjust:

[git]
MAX_GIT_DIFF_LINES = 100 ; 1000: Max number of lines allowed of a single file in diff view.
MAX_GIT_DIFF_LINE_CHARACTERS = ; 5000: Max character count per line highlighted in diff view.
MAX_GIT_DIFF_FILES = 20 ; 100: Max number of files shown in diff view.

but tell us exactly which urls are slow.

zeripath avatar Aug 08 '22 18:08 zeripath

actually it looks like we need git commit-graph write --changed-paths

zeripath avatar Aug 08 '22 19:08 zeripath

Upgrade gitea.com and now it will take 13 secs in first time loading and less 4 secs in other time when visiting https://gitea.com/marktsai0316/linux/commits/branch/master/scripts/clang-tools .

After manually executed git commit-graph write --changed-paths in the server side, now it will take less 5 second in the first time and less 1 second in other time.

lunny avatar Aug 15 '22 04:08 lunny

Ok so it looks like that page could still benefit from a deferable/asynchronous request so I can dust off the code I was working on for that.

I don't think it's worth caching these results as I expect that they should not be being requested often.

I guess I should look at the other URL too

zeripath avatar Aug 15 '22 05:08 zeripath

We also experience slow performance. https://hub.openingdesign.com/

theoryshaw avatar Oct 26 '22 22:10 theoryshaw

Hi all. I've read through the three issues I can find about slow performance on pulls/diffs. I see that the proposed solution is to shift the diffing to an async request. Are there any tips/tricks to be used on the meantime, or partial work on the async change we could take a look at please (@zeripath?)? Any guidance/hints/workarounds would be much appreciated 💙

garymoon avatar Nov 29 '22 14:11 garymoon

  • Ensure that commit-graphs are written
    • gitea doctor --run check-commit-graphs --fix
  • Could you please replicate the issue on try.gitea.io and give us an example of what is slow?
  • Could you tell me if changing these settings in your app.ini improve things?
[git]
MAX_GIT_DIFF_LINES = 100 ; 1000: Max number of lines allowed of a single file in diff view.
MAX_GIT_DIFF_LINE_CHARACTERS = ; 5000: Max character count per line highlighted in diff view.
MAX_GIT_DIFF_FILES = 20 ; 100: Max number of files shown in diff view.

Without logs or some concrete test case we're blind and we don't know what needs to be improved.

zeripath avatar Nov 30 '22 20:11 zeripath

Thank you mate. Sorry for the misunderstanding, I assumed since the issue was still open that work was ongoing. The slowness in our case is just due to the scale of some very large PRs, and we're interested in loading the diffs async (ala Github). If there's no code (incomplete or otherwise) to poke at, please consider my reply here withdrawn, and sorry again for the misunderstanding :blue_heart:

garymoon avatar Dec 05 '22 16:12 garymoon

@garymoon - there's already partial async load - we just load a certain number by default. And that amount is IMO too large.

if you change the [git] settings I mentioned above that should improve things a lot.

If you're after an infinite scroll type thing that would be great and there's practically all of the basic API is there - we'd just have to sort out the JS.

zeripath avatar Dec 08 '22 19:12 zeripath

For what its worth I have a repo with 10k commits where loading the commit graph takes mere seconds, and I have repositories with barely 1.6k commits where loading the commit graph takes over two minutes. It doesn't necessarily seem dependent on repository size alone.

Sebazzz avatar Oct 30 '23 06:10 Sebazzz

Can you test the second time loading time?

lunny avatar Oct 30 '23 07:10 lunny

Upgrade gitea.com and now it will take 13 secs in first time loading and less 4 secs in other time when visiting https://gitea.com/marktsai0316/linux/commits/branch/master/scripts/clang-tools .

After manually executed git commit-graph write --changed-paths in the server side, now it will take less 5 second in the first time and less 1 second in other time.

As I can see it will only 490ms on the current version. I think this can be closed in a short term if there is no further information.

lunny avatar Jan 26 '24 10:01 lunny

We close issues that need feedback from the author if there were no new comments for a month. :tea:

GiteaBot avatar Feb 25 '24 12:02 GiteaBot