Open-Assistant icon indicating copy to clipboard operation
Open-Assistant copied to clipboard

Add netdata for infrastructure monitoring

Open andrewm4894 opened this issue 2 years ago • 4 comments

[DRAFT PR] - needs discussion, just sharing this as a POC or example for other to try out and see.

This PR adds a netdata monitoring agent running alongside the other containers.

End result being a local dashboard where you can see cpu, mem, net for each container:

image

And also I have enabled redis and postgres collectors so that can see lots of redis and postgres stuff:

image

image

You can also claim the node to netdata cloud which is just a bit easier to work with and has some more features but that optional - can also just keep it running in local mode too.

Once open assistant is running you can run

docker compose up --build netdata

To turn on netdata and then at post 19999 would be the local netdata dashboard and charts.

This could be extended with more custom metrics using something like statsd or prometheus and could configure alerts to go to discord or email etc easy enough.

andrewm4894 avatar Feb 08 '23 17:02 andrewm4894

This looks cool. I'm tagging this as testing since it's reliability and introspection related

fozziethebeat avatar Feb 09 '23 02:02 fozziethebeat

@andrewm4894 are you already member of our OA discord server? I created a new #docs channel .. if you join the server please ping me.

andreaskoepf avatar Feb 09 '23 22:02 andreaskoepf

Since #1426 is merged i also configured netdata to scrape the Prometheus formatted /metrics endpoints of backend and inference-server.

This gives you charts like this - one for each Prometheus endpoint (inference server not running so not there) where can see counts by api endpoint etc:

image

Probably would need to add a little more sort of processing logic to the prometheus.conf but mainly this just illustrates the flow.

andrewm4894 avatar Feb 10 '23 16:02 andrewm4894

Just wanna pipe up and say that this looks really cool! I wasn't aware of netdata but used Prometheus in the early days. Love it! :)

bitplane avatar Feb 20 '23 02:02 bitplane

closing as done in https://github.com/LAION-AI/Open-Assistant/pull/1563

andrewm4894 avatar Mar 08 '23 18:03 andrewm4894