docs: Add dashboard links
coverage: 99.908%. remained the same when pulling 5bd2e1b21a49726c577483b9a12aee9ff5ce0d97 on kibertoad:docs/dashboards into 3da860f0e6f0650dcb95f62e5b71af6dfbeb44f1 on timgit:master.
Ok, I'm having 2nd thoughts on this. Including these linked prominently in the readme is going to be seen as a recommendation and not quite the "use at your own risk" that I would prefer.
First of all none are current with v12. pg-bossman at least is compatible with v11, and I like that he's using Hono, but after a quick review of the integration, there are some queries that are a bit concerning. Additionally, the other 2 packages run queries against the job table that I would specifically recommend against for performance reasons.
The reason I have left out a dashboard in the past is because of the increased time commitment that a UI always brings along with it. Who knows, maybe 2026 is "the right time" to create one that I feel comfortable recommending now that one can simply plug all of these guidelines into an agent prompt and be done with it in days.
@timgit I can probably put quickly something together if you have some guidelines you would like to provide on how it is supposed to work, do's and don'ts
Oh boy, you had to ask that. ;) I'm going to just write down my thoughts. This is not meant to be "you need to do this".
The primary source of stats that are the most useful for a dashboard are cached in the queue table. The goal would be to avoid issuing arbitrary select count(*) from job queries since this is already being done by the workers during monitoring. These stats are what I considered a MVP for useful metrics and not exhaustive. If and when gaps are found, we'd likely extend the metrics that are stored in the queue table. This puts all the runtime query pressure against the queue table, which is great in comparison the partitioned job table.
In order to make a dashboard super useful, the warning events emitted whenever a query takes longer than 30s to execute or a backlog is forming should be also be displayed. These are not currently being cached into the queue table, but this thought exercise helps me realize that is the next step on the list of things to do for better monitoring. Most of the time I've had issues with pg-boss, it was always because of a backlog or a very large queue that needed maintenance.
Monitoring slow queries is the easiest way to detect that a problem is forming, so perhaps caching the most recent monitoring or maintenance execution time would be useful. It could be snapshotted and exported to be used by a first-class monitoring and viz platform like Grafana, which is what I've used in the past. Other tools like this provide visibility into health over time, showing correlations between queue size and query execution time, adding annotations when warnings are emitted, etc. I'm not especially excited about trying to build something like a metrics table into pg-boss to offer a stripped-down version of this, because this has the potential to become a maintenance burden, both for the queue operator and myself.
If you want UI tech stack opinions, the only thing I could recommend would be React and something compatible with most backend runtimes (Node, Deno, Bun) like Hono that seems lightweight. The tech in this space is still changing rapidly, so making a decision means it's obsolete in a year, lol. I also jumped into Remix years ago and still work with React Router framework. I'm not a fan of Next.js.
Hope this helps and I'm always open to feedback and your thoughts
Thank you, that helps! I'm a big fan of Remix/RR too! On be side I am very much a fastify person these days, but hono is nice too.
Could you bootstrap a repo for the dashboard and come up with a name? Or you would prefer it as a subfolder on the same repo?
@timgit So how should we proceed in terms of dashboard repo?
I'd prefer it as a subfolder in this repo