frostdb
frostdb copied to clipboard
frostdb: add compactorPool to limit compaction goroutines
This commit adds a worker pool of goroutines (runtime.NumCPU by default) per database that will iterate over all the tables in that database and attempt to perform a compaction every defined sweep interval (100ms by default).
The compaction logic has not been modified.
This PR also adds a compacting atomic boolean to the granule so that no two goroutines attempt to compact the same granule. Previously, I think this was the case because the only atomic that was checked was pruned, which is only set once the granule has been pruned from the index.
Closes #119
Hey @asubiotto, what about using runtime.GOMAXPROCS(0) instead of runtime.NumCPU() so users like Parca can use the environment variable to configure it, and potentially automatically match it with the cgroup settings with go.uber.org/automaxprocs?
Demo:
go mod init main
go get go.uber.org/automaxprocs@latest
cat >main.go <<EOF
package main
import (
"fmt"
"runtime"
_ "go.uber.org/automaxprocs"
)
func main() {
fmt.Println(runtime.NumCPU(), runtime.GOMAXPROCS(0))
}
EOF
go run main.go
# 2022/10/13 11:22:23 maxprocs: Leaving GOMAXPROCS=8: CPU quota undefined
# 8 8
docker run --rm --cpus=1 --volume="${PWD}:/go/src" --workdir=/go/src \
docker.io/golang:1.19-alpine go run main.go
# go: downloading go.uber.org/automaxprocs v1.5.1
# 2022/10/13 18:23:06 maxprocs: Updating GOMAXPROCS=1: determined from CPU quota
# 8 1
I think that's a great idea @maxbrunet, thanks for the suggestion.
I think overall I'm fine with this change, my only question is if we considered instead of doing a sweep, we do a notify? Where after adding to a granule and determining it requires compaction we send that granule pointer over on a channel to the compaction pool? Notify might be more responsive, obviously does have the drawback of potentially blocking on the channel writes.
Yep! #119 has some discussion on this and the main takeaway is that we want to keep the insert path non-blocking. Another option is to use condition variables, but those still acquire locks to signal and it'd be to trigger a sweep rather than sending the pointer anywhere. I think an interval sweep style is probably a good approach for now since it's tunable.
I think overall I'm fine with this change, my only question is if we considered instead of doing a sweep, we do a notify? Where after adding to a granule and determining it requires compaction we send that granule pointer over on a channel to the compaction pool? Notify might be more responsive, obviously does have the drawback of potentially blocking on the channel writes.
Yep! #119 has some discussion on this and the main takeaway is that we want to keep the insert path non-blocking. Another option is to use condition variables, but those still acquire locks to signal and it'd be to trigger a sweep rather than sending the pointer anywhere. I think an interval sweep style is probably a good approach for now since it's tunable.
Sounds great to me