go-git
go-git copied to clipboard
change map to sync.map to support concurrency
Hey @calvin-f, why did you close this? I also tried to solve this race with https://github.com/go-git/go-git/pull/175
Did you find something out that made it unnecessary?
Hey @dustin-decker
No, it works well. Creating this pull request on the other hand was just by accident. I intended to create it in my own fork.
I came up with this solution as this was proposed already earlier in the old repository but got never merged. https://github.com/src-d/go-git/pull/1270
I see. Could you re-open this then? I'm trying to get it fixed upstream and prefer this fix over mine.
I would be very happy if this PR got merged in. Being able to work concurrently on repositories would be amazing! 🙏
The two main problems to merges this are:
- The full library needs to support concurrency, and this requires an in-depth review of the code base.
- Adding concurrency will speed down some operations (or maybe not much), but this requires do some benchmarks.
Sadly now I don't have time for either of those tasks :/
My two cents:
* The full library needs to support concurrency, and this requires an in-depth review of the code base.
True, but progress toward that goal is better than nothing. That said, fixing the concurrency issues reported by go test -race ./...
would do wonder already.
* Adding concurrency will speed down some operations (or maybe not much), but this requires do some benchmarks.
At the moment, the alternative it to use a massive mutex around the whole repository, including for operations that don't require any locking. Everything is going to be better performance wise than that.
I agree 100% with @MichaelMure. Anything in the direction of this PR is better than the current state.
At the moment, the alternative it to use a massive mutex around the whole repository, including for operations that don't require any locking. Everything is going to be better performance wise than that.
This when your use case is processed only one repository at a time. SInce my use cause is completely the opposite, I process hundreds of repositories at the same time, your PR will degrade the performance of my deployment. So is clear that your affirmation is not true.
I will be ok merging something that is something pluggable or optional. And not a change in the internal behavior of the library.
@mcuadros I meant that from my project point of view, I can either add this global mutex or get panics due to concurrent access. Which is indeed the worst way to solve that problem.
I will be ok merging something that is something pluggable or optional. And not a change in the internal behavior of the library.
Does that means that you don't want to make go-git thread-safe? Or is that about not merging an ugly global mutex?
Does that means that you don't want to make go-git thread-safe? Or is that about not merging an ugly global mutex?
This question is really important. Thanks, @MichaelMure.
@mcuadros: because (a) if you "don't want to make go-git thread-safe", then it's useless to discuss anything in this PR, since everyone is wasting their precious time. If it's because the code quality of this PR is not satisfactory, and (b) "you don't want to merge an ugly global mutex", then it gives a chance to the community to propose something better.
If it's (b), could you please point to some directions here on where we should pay attention when changing the code to make it thread-safe (and ultimately, propose something better)?
I don't want to make go-git thread-safe, by default for several reason:
- Break compatibility requires a major version upgrade
- Make slower the library, and not everyone has your use case.
So if you make a PR where this functionality is disabled by default and could be enabled, I will be happy to merge it.
BTW I already wrote how should be 15days ago:
I will be ok merging something that is something pluggable or optional. And not a change in the internal behavior of the library.
@mcuadros Would you be open to merge something that 1) doesn't break compat and 2) doesn't affect performance?
I believe that should be possible as:
- there seems to be fairly few places where concurrency is an issue
- go-git is not massively parallel to begin with and most of the performance cost boils down to disk access, data processing and networking, not mutexes and locks
The first step would be to have proper benchmark to measure this performance cost, if any. Is there some scenario you care about especially, beside the benchmark that already exist in go-git?
I will only merge it if it's an optional funcionallity, so we don't need to do benchmarks. So you can include the lock function in a function with a boolean.
Hi,
Thanks for this library! Are there recommended alternatives to use go-git in a thread safe way?
Thanks!
I will argue for merging this as is, without benchmarks because, IMO, a race condition is a bug, and fixing it is not a feature request nor enhancement. It's simply making the code function correctly.
If this package were implemented in a language where concurrency isn't a native language feature, a disclaimer like "this package is not safe to use concurrently" is probably valid. But concurrency is a feature of Go. So much so, that the runtime itself detects this race condition. Which means, this bug effectively forces me to not be able to use parts of the Go language. It's not forcing me to consider a different design for my code. It's like having a bug that breaks the use of for
loops.
To me, this is a serious bug that deserves a patch.
I would fully expect concurrent access to a single repository to be safe. I have been using go-git for a couple years and just found out it isn't, because we had panics in production that I still cannot reproduce synthetically. To me, and I suspect the vast majority of users, that is a bug, and quite a bad one (thankfully this issue was not hard to find).
The phrase is "make it work, make it right, make it fast" and right now it isn't right. Usually "thread unsafe" is an optional setting for specific use cases, and the default is whatever is correct.
It seems that having benchmarks would be valuable. I was reading the code and wondering e.g. whether calling FindHash
is a win, and not having benchmarks makes that difficult to answer.
I couldn't find an issue describing the benchmarks needed. Is there one? What are the key scenarios? I think there's an obvious scenario where we build a big git repo with lots of files and lots of commits, then reopen the repo to flush caches and visiting every file in every commit. But that doesn't cover parallel processing of repos, although if we avoid global mutexes and make the N=1 case faster, I would expect it to speed up the N=100 case also. So would that be a sufficient benchmark for your use case @mcuadros ?
There is quite a bit on this thread, so I will try to unpack a few points.
Data races
That said, fixing the concurrency issues reported by
go test -race ./...
would do wonder already.
This PR fixes no data races reported via go test -race
on amd64 Linux.
Based on latest versions, all the data races reported originate from go-billy
, and not go-git
. I will propose a PR for that separately.
Performance impact
These changes have a 7x performance hit in terms of execution time. There is also an allocation impact which over time will increase GC pressure.
Below is the before and after (rebased) of the existing benchmarks, considering only the relevant deltas.
goos: linux goarch: amd64
cpu: 11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz
name old time/op new time/op delta
FindOffset-16 279ns ± 4% 2120ns ± 2% ~ (p=0.100 n=3+3)
name old alloc/op new alloc/op delta
FindOffset-16 0.00B 416.00B ± 0% ~ (p=0.100 n=3+3)
name old allocs/op new allocs/op delta
FindOffset-16 0.00 25.00 ± 0% ~ (p=0.100 n=3+3)
But, I agree some of the points in the thread:
-
go-git
should just work, any expection or optimisation should be caveated/documented. Everyone likes surprise-free experience. :smile: - High index concurrency should not force a performance penalty on users needing High repository concurrency without a way to opt-out.
I have also benchmarked #175, which fixes the same problem. Based on the performance results (refer to https://github.com/go-git/go-git/pull/175#issuecomment-1336265106), I would prefer that over this PR.