gimlet
gimlet copied to clipboard
Optimize amount of queries to database
We see quite many queries (823502/day) of the type
SELECT id, created, type, blob, status, status_desc, sha, repository, branch, event, source_branch, target_branch, tag, artifact_id
FROM events
WHERE status='new' order by created ASC limit 10;
This should be optimized a bit so they don't show up in the top query metrics.
Yeah, we poll the DB every 100milliseconds to get new events to process them and do policy based deploys, etc.
https://github.com/gimlet-io/gimlet/blob/main/pkg/gimletd/worker/gitops.go#L71
This is a rather naive implementation.
I could decrease polling down to 500ms, or make it configurable. But long term solution is either a real queue, or a DB backed in memory queue in the go code.
But long term solution is either a real queue, or a DB backed in memory queue in the go code.
Introducing a new component is not desired in Gimlet, therefor to remove the polling from the code, we should create a queue implementation in Golang. A queue implementation that is persistent and survives restarts, a persistent queue implementation.
Maybe there is a persistent queue library for Golang, or we can be inspired by Drone's / Woodpecker CI's persistent queue implementation: https://github.com/woodpecker-ci/woodpecker/blob/master/server/queue/queue.go#L135
This is where the persistent queue is created upon start: https://github.com/woodpecker-ci/woodpecker/blob/3f2af06bbcc2f69776bb4b58bc3d6883027315f7/cmd/server/setup.go#L160
Good First Issue
This could be a good first issue as all it requires is Golang skills.
Hi@laszlocph can you point me to the directory where changes needed to be made?
Hi@laszlocph can you point me to the directory where changes needed to be made?
This is the point where we call the DB: https://github.com/gimlet-io/gimlet/issues/117#issuecomment-1080583212
The design of course is a bit larger. DM me on twitter and we can jump on a call.
I guess this could be greatly improved with postgres using the LISTEN/NOTIFY feature coupled with triggers, no need to use persistent queues: https://www.postgresql.org/docs/current/sql-notify.html Example: https://medium.com/launchpad-lab/postgres-triggers-with-listen-notify-565b44ccd782
With sqlite it's a bit different, but also doable, you can listen to all update events happening in the server, and you need to handle it in the client: https://www.sqlite.org/c3ref/update_hook.html Example: https://gist.github.com/bruth/bfd84fdfc855e79c9e79
I'd imagine the eventing could work in a similar way to the kubernetes informer pattern, which also doesn't use a persistent queue.
How it could look like:
- On new database connection, perform the full query to see all 'new' items, and put them in a go channel which your event processing goroutine will be consuming
- for every subsequent change you get notified by the database (postgres: LISTEN/NOTIFY, sqlite3: update_hook), those ca nalso land in the same channel.
- if the database connnection breaks, you start from the beginning: fetch all 'new' items, then start listening for change events
Anyway, just a wild idea.
Anyway, just a wild idea.
Not a bad one at all. If the integration is easy, could be even less code than the persistent queue approach. 👍
Perhaps the neoq lib could be of use here or act as guideline on implementing this on top of PG. https://github.com/acaloiaro/neoq#postgres
After looking at this again, I actually think the underlying problem is that we share our production database with Gimlet. Having a separate dedicated database would allow Gimlet to do whatever, without affecting its surroundings.
Introducing a new component is not desired in Gimlet
I'm not familiar with all the requirements, but I would imagine that Redis is a better choice for storing and polling this kind of data, especially if we just want a small internal data store. If replacing a component in Gimlet is an option, then that can also be considered.
After looking at this again, I actually think the underlying problem is that we share our production database with Gimlet. Having a separate dedicated database would allow Gimlet to do whatever, without affecting its surroundings.
We are running Gimlet in 100 instances in our cloud platform. This just has to be done ;D Apparently there are multiple good solutions here. So this is still something we want to do.
Introducing a new component is not desired in Gimlet
I'm not familiar with all the requirements, but I would imagine that Redis is a better choice for storing and polling this kind of data, especially if we just want a small internal data store. If replacing a component in Gimlet is an option, then that can also be considered.
Redis would work. However, for operation experience, the no new component principle still takes priority.
I will deliver this in January if there are no takers in the meantime.
Tried the SQLite hooks/Postgres triggers based approach. Unfortunately SQLite hooks does not notify about out of process updates to the DB. eg.: manual edits to the data. This was disapointing. Plus I needed to extend the SQLite driver to make it work.
So I settled on a naive, channel based notification approach in #836