frostdb icon indicating copy to clipboard operation
frostdb copied to clipboard

frostdb: add compactorPool to limit compaction goroutines

Open asubiotto opened this issue 3 years ago • 2 comments

This commit adds a worker pool of goroutines (runtime.NumCPU by default) per database that will iterate over all the tables in that database and attempt to perform a compaction every defined sweep interval (100ms by default).

The compaction logic has not been modified.

This PR also adds a compacting atomic boolean to the granule so that no two goroutines attempt to compact the same granule. Previously, I think this was the case because the only atomic that was checked was pruned, which is only set once the granule has been pruned from the index.

Closes #119

asubiotto avatar Oct 13 '22 10:10 asubiotto

Hey @asubiotto, what about using runtime.GOMAXPROCS(0) instead of runtime.NumCPU() so users like Parca can use the environment variable to configure it, and potentially automatically match it with the cgroup settings with go.uber.org/automaxprocs?

Demo:

go mod init main
go get go.uber.org/automaxprocs@latest

cat >main.go <<EOF
package main

import (
	"fmt"
	"runtime"

	_ "go.uber.org/automaxprocs"
)

func main() {
	fmt.Println(runtime.NumCPU(), runtime.GOMAXPROCS(0))
}
EOF

go run main.go
# 2022/10/13 11:22:23 maxprocs: Leaving GOMAXPROCS=8: CPU quota undefined
# 8 8

docker run --rm --cpus=1 --volume="${PWD}:/go/src" --workdir=/go/src \
  docker.io/golang:1.19-alpine go run main.go
# go: downloading go.uber.org/automaxprocs v1.5.1
# 2022/10/13 18:23:06 maxprocs: Updating GOMAXPROCS=1: determined from CPU quota
# 8 1

maxbrunet avatar Oct 13 '22 18:10 maxbrunet

I think that's a great idea @maxbrunet, thanks for the suggestion.

asubiotto avatar Oct 14 '22 07:10 asubiotto

I think overall I'm fine with this change, my only question is if we considered instead of doing a sweep, we do a notify? Where after adding to a granule and determining it requires compaction we send that granule pointer over on a channel to the compaction pool? Notify might be more responsive, obviously does have the drawback of potentially blocking on the channel writes.

Yep! #119 has some discussion on this and the main takeaway is that we want to keep the insert path non-blocking. Another option is to use condition variables, but those still acquire locks to signal and it'd be to trigger a sweep rather than sending the pointer anywhere. I think an interval sweep style is probably a good approach for now since it's tunable.

asubiotto avatar Oct 14 '22 14:10 asubiotto

I think overall I'm fine with this change, my only question is if we considered instead of doing a sweep, we do a notify? Where after adding to a granule and determining it requires compaction we send that granule pointer over on a channel to the compaction pool? Notify might be more responsive, obviously does have the drawback of potentially blocking on the channel writes.

Yep! #119 has some discussion on this and the main takeaway is that we want to keep the insert path non-blocking. Another option is to use condition variables, but those still acquire locks to signal and it'd be to trigger a sweep rather than sending the pointer anywhere. I think an interval sweep style is probably a good approach for now since it's tunable.

Sounds great to me

thorfour avatar Oct 14 '22 15:10 thorfour