couchdb icon indicating copy to clipboard operation
couchdb copied to clipboard

profiling with small overhead

Open sergey-safarov opened this issue 3 months ago • 3 comments

Summary

Time to time, we catch CPU high load like

Image During this time, CouchDB returns an error response. To resolve this, we are restarting all CouchDB nodes in the cluster. To troubleshoot this issue, it will be fine to get CouchDB profiling with the busiest functions at this time. Like it was done for RabbitMQ. https://www.rabbitmq.com/blog/2022/05/31/flame-graphs

Could you implement a similar feature for CouchDB?

Desired Behaviour

Attach the performance profiling tool to the running CouchDB process and record function performance metrics. Then use these metrics to draw a graph with most busiest functions.

Possible Solution

Probably it can be done like in RabbitMQ https://www.rabbitmq.com/blog/2022/05/31/flame-graphs

sergey-safarov avatar Sep 06 '25 08:09 sergey-safarov

That would neat to have @sergey-safarov, thanks for the link to https://www.rabbitmq.com/blog/2022/05/31/flame-graphs

Our Erlang version is high enough and we can set +JPperf true and limit scheduler threads (+S 4) in vm.args easily.

We used this a few times to various bottlenecks in the past. I also have various snippet to use eprof and fprof but Linux perf-stack is so much nicer of course.

One missing part is generating load. There is an internal fabric_bench but it's a very simply, sequential cluster benchmark, just to "kick the tires" as they say. I have a few k6 scripts I use sometimes for a concurrent load but those may need some cleanup and k6 may not be everyone's favorite.

nickva avatar Sep 23 '25 15:09 nickva

We can provide anonymized database dumps of different sizes (1 million, 16 million, 64 million documents) and a script to upload this dump to CouchDB. This can be used as a load generation tool. This will trigger an index rebuild and view compactions during docs upload.

sergey-safarov avatar Sep 24 '25 08:09 sergey-safarov

That could work but it may be simpler to start with some synthetic benchmarks, for example I found a few k6 scripts floating around:

https://gist.github.com/nickva/d5312248635534315e76d25b31a86faa : a simple get,put,post for individual docs script. Not as configurable

https://gist.github.com/nickva/cd6679595146c3cb83fe4368cda89c3f : a more complete one with configurable test types (via env variables or command lines). This one also tries to make requests come at constant rate.

nickva avatar Sep 25 '25 05:09 nickva