gimlet icon indicating copy to clipboard operation
gimlet copied to clipboard

Optimize amount of queries to database

Open madss opened this issue 2 years ago • 6 comments

We see quite many queries (823502/day) of the type

SELECT id, created, type, blob, status, status_desc, sha, repository, branch, event, source_branch, target_branch, tag, artifact_id
FROM events
WHERE status='new' order by created ASC limit 10;  

This should be optimized a bit so they don't show up in the top query metrics.

madss avatar Mar 28 '22 12:03 madss

Yeah, we poll the DB every 100milliseconds to get new events to process them and do policy based deploys, etc.

https://github.com/gimlet-io/gimlet/blob/main/pkg/gimletd/worker/gitops.go#L71

This is a rather naive implementation.

laszlocph avatar Mar 28 '22 12:03 laszlocph

I could decrease polling down to 500ms, or make it configurable. But long term solution is either a real queue, or a DB backed in memory queue in the go code.

laszlocph avatar Mar 28 '22 12:03 laszlocph

But long term solution is either a real queue, or a DB backed in memory queue in the go code.

Introducing a new component is not desired in Gimlet, therefor to remove the polling from the code, we should create a queue implementation in Golang. A queue implementation that is persistent and survives restarts, a persistent queue implementation.

Maybe there is a persistent queue library for Golang, or we can be inspired by Drone's / Woodpecker CI's persistent queue implementation: https://github.com/woodpecker-ci/woodpecker/blob/master/server/queue/queue.go#L135

This is where the persistent queue is created upon start: https://github.com/woodpecker-ci/woodpecker/blob/3f2af06bbcc2f69776bb4b58bc3d6883027315f7/cmd/server/setup.go#L160

laszlocph avatar Sep 02 '22 11:09 laszlocph

Good First Issue

This could be a good first issue as all it requires is Golang skills.

laszlocph avatar Sep 02 '22 11:09 laszlocph

Hi@laszlocph can you point me to the directory where changes needed to be made?

ashiskumarnaik avatar Oct 12 '22 08:10 ashiskumarnaik

Hi@laszlocph can you point me to the directory where changes needed to be made?

This is the point where we call the DB: https://github.com/gimlet-io/gimlet/issues/117#issuecomment-1080583212

The design of course is a bit larger. DM me on twitter and we can jump on a call.

laszlocph avatar Oct 18 '22 12:10 laszlocph

I guess this could be greatly improved with postgres using the LISTEN/NOTIFY feature coupled with triggers, no need to use persistent queues: https://www.postgresql.org/docs/current/sql-notify.html Example: https://medium.com/launchpad-lab/postgres-triggers-with-listen-notify-565b44ccd782

With sqlite it's a bit different, but also doable, you can listen to all update events happening in the server, and you need to handle it in the client: https://www.sqlite.org/c3ref/update_hook.html Example: https://gist.github.com/bruth/bfd84fdfc855e79c9e79

I'd imagine the eventing could work in a similar way to the kubernetes informer pattern, which also doesn't use a persistent queue.

How it could look like:

  • On new database connection, perform the full query to see all 'new' items, and put them in a go channel which your event processing goroutine will be consuming
  • for every subsequent change you get notified by the database (postgres: LISTEN/NOTIFY, sqlite3: update_hook), those ca nalso land in the same channel.
  • if the database connnection breaks, you start from the beginning: fetch all 'new' items, then start listening for change events

Anyway, just a wild idea.

reegnz avatar Dec 13 '23 12:12 reegnz

Anyway, just a wild idea.

Not a bad one at all. If the integration is easy, could be even less code than the persistent queue approach. 👍

laszlocph avatar Dec 13 '23 19:12 laszlocph

Perhaps the neoq lib could be of use here or act as guideline on implementing this on top of PG. https://github.com/acaloiaro/neoq#postgres

smiklos avatar Dec 18 '23 11:12 smiklos

After looking at this again, I actually think the underlying problem is that we share our production database with Gimlet. Having a separate dedicated database would allow Gimlet to do whatever, without affecting its surroundings.

Introducing a new component is not desired in Gimlet

I'm not familiar with all the requirements, but I would imagine that Redis is a better choice for storing and polling this kind of data, especially if we just want a small internal data store. If replacing a component in Gimlet is an option, then that can also be considered.

madss avatar Dec 18 '23 12:12 madss

After looking at this again, I actually think the underlying problem is that we share our production database with Gimlet. Having a separate dedicated database would allow Gimlet to do whatever, without affecting its surroundings.

We are running Gimlet in 100 instances in our cloud platform. This just has to be done ;D Apparently there are multiple good solutions here. So this is still something we want to do.

Introducing a new component is not desired in Gimlet

I'm not familiar with all the requirements, but I would imagine that Redis is a better choice for storing and polling this kind of data, especially if we just want a small internal data store. If replacing a component in Gimlet is an option, then that can also be considered.

Redis would work. However, for operation experience, the no new component principle still takes priority.

I will deliver this in January if there are no takers in the meantime.

laszlocph avatar Dec 18 '23 12:12 laszlocph

Tried the SQLite hooks/Postgres triggers based approach. Unfortunately SQLite hooks does not notify about out of process updates to the DB. eg.: manual edits to the data. This was disapointing. Plus I needed to extend the SQLite driver to make it work.

So I settled on a naive, channel based notification approach in #836

laszlocph avatar Mar 12 '24 20:03 laszlocph