gitea
gitea copied to clipboard
[1.17.0] Context deadline exceeded (timeout) cloning large repo
Description
Starting with v1.17.0, one of our large repos (70+ GB, using LFS) is failing trying to clone on some connections.
When the clone fails, the client reports some variation of:
error: 24576 bytes of body are still expected0 MiB | 1.83 MiB/s
fetch-pack: unexpected disconnect while reading sideband packet
fatal: early EOF
fatal: fetch-pack: invalid index-pack output
and the server logs show, e.g.:
2022/08/05 03:27:25 ...ers/web/repo/http.go:484:serviceRPC() [E] [62ec8cb5-3] Fail to serve RPC(upload-pack) in /mnt/gitea/data/repositories/org/repo.git: context deadline exceeded -
I tracked this down to a change in commit 35fdefc1ff253522f101ffb1337437b59676c302 for #18363 where serviceRPC() was no longer allowed to run indefinitely, but carries a context with an unchangeable default timeout of 360 seconds (once I found this, I confirmed that our clones were dying around the 6 minute mark).
We are locally using a custom build with this quick patch in order to set an extremely large timeout to allow us to work around the issue:
diff --git a/modules/setting/setting.go b/modules/setting/setting.go
index f5e624946..585c4589c 100644
--- a/modules/setting/setting.go
+++ b/modules/setting/setting.go
@@ -125,6 +125,7 @@ var (
StartupTimeout time.Duration
PerWriteTimeout = 30 * time.Second
PerWritePerKbTimeout = 10 * time.Second
+ ServiceRpcTimeout time.Duration
StaticURLPrefix string
AbsoluteAssetURL string
@@ -723,6 +724,7 @@ func loadFromConf(allowEmpty bool, extraConfig string) {
StartupTimeout = sec.Key("STARTUP_TIMEOUT").MustDuration(0 * time.Second)
PerWriteTimeout = sec.Key("PER_WRITE_TIMEOUT").MustDuration(PerWriteTimeout)
PerWritePerKbTimeout = sec.Key("PER_WRITE_PER_KB_TIMEOUT").MustDuration(PerWritePerKbTimeout)
+ ServiceRpcTimeout = sec.Key("SERVICE_RPC_TIMEOUT").MustDuration(ServiceRpcTimeout)
defaultAppURL := string(Protocol) + "://" + Domain
if (Protocol == HTTP && HTTPPort != "80") || (Protocol == HTTPS && HTTPPort != "443") {
diff --git a/routers/web/repo/http.go b/routers/web/repo/http.go
index 6a85bca16..82835cb62 100644
--- a/routers/web/repo/http.go
+++ b/routers/web/repo/http.go
@@ -474,11 +474,12 @@ func serviceRPC(ctx gocontext.Context, h serviceHandler, service string) {
cmd := git.NewCommand(h.r.Context(), service, "--stateless-rpc", h.dir)
cmd.SetDescription(fmt.Sprintf("%s %s %s [repo_path: %s]", git.GitExecutable, service, "--stateless-rpc", h.dir))
if err := cmd.Run(&git.RunOpts{
- Dir: h.dir,
- Env: append(os.Environ(), h.environ...),
- Stdout: h.w,
- Stdin: reqBody,
- Stderr: &stderr,
+ Dir: h.dir,
+ Env: append(os.Environ(), h.environ...),
+ Stdout: h.w,
+ Stdin: reqBody,
+ Stderr: &stderr,
+ Timeout: setting.ServiceRpcTimeout,
}); err != nil {
if err.Error() != "signal: killed" {
log.Error("Fail to serve RPC(%s) in %s: %v - %s", service, h.dir, err, stderr.String())
I can make a PR out of this if this is desired, but I'd kind of rather this go back to the old behavior instead of adding a new 6-minute timeout footgun where, even if targeted logging was added to point people to correcting the failure, admins would need to find this config variable and change it for any decently-large repo. 1.16.x's behavior was much preferred.
Gitea Version
1.17.0
Can you reproduce the bug on the Gitea demo site?
No
Log Gist
No response
Screenshots
No response
Git Version
2.37.1
Operating System
Ubuntu 20.04 aarch64
How are you running Gitea?
Linux arm64 release on an AWS instance
Database
PostgreSQL
Yes, please. And I think you can give a default timeout value.
IMO there is no need to introduce another config option.
There is a context for serviceRPC, the timeout should be controlled by it. It might be possible to make Git's Run respect the context.Deadline() or have a long-enough timeout.
IMO there is no need to introduce another config option.
I agree.
There is a context for
serviceRPC, the timeout should be controlled by it. It might be possible to make Git'sRunrespect thecontext.Deadline()
I added some logging to Run(), and, in my limited testing, only a few contexts come in with a deadline set on them (serviceRPC's context has no deadline). Would it be better here to fall back to the context deadline whenever the given Timeout is <= 0 and skip setting a timeout at all if both of those are unspecified? I guess that wouldn't work since then we'd never be using this fallback default value. I think no matter what, serviceRPC() should be able to indicate that it doesn't want to be cut off just because a timeout expired. Maybe a fourth Run() variant that is timeout-free is appropriate?
or have a long-enough timeout.
To me, the problem with this is, as always, "how long is long enough?" 6 minutes may have seemed long enough, but when the scenario is "however long upload-pack takes to deliver an arbitrarily-sized repository through an arbitrarily-fast network connection," I'm not sure there can ever be a "long enough." It's especially problematic in this case because even if you choose something that seems "long enough," if it fails for someone there is no way for the admin to make changes to allow it to complete. I've set our timeout here to 24 hours, for example, but I can tell you that if my clone fails 24 hours in for no other reason than "there was a timeout set" and I have to start over again, depending on what part of the clone it's at, I'd be less-than-thrilled as a result.
Yup, at the moment, the git.Run will use default timeout if Timeout<=0.
IMO there might be 2 methods:
- Method 1: If there is a deadline of the use serviceRPC context, use it. If no, use a long-enough timeout (for example, 1 year) timeout.
- Method 2: Introduce a new const like
UseContextTimeout = -2. ThenRun(Timeout: UseContextTimeout), if there is no deadline in the context, then do not use timeout when running the git command.
So... what do I do if I'm getting this error?
Clone: context deadline exceeded
https://docs.gitea.io/en-us/config-cheat-sheet/#git---timeout-settings-gittimeout
Adjust/set [git.timeout] MIGRATE appropriately.