ArchiveBot
ArchiveBot copied to clipboard
Poor dashboard performance scaling
Following #367 and #383, the control node is now easily able to handle a couple hundred parallel jobs. However, the dashboard JS is unable to keep up with that. Earlier today, when we were reaching our current pipeline capacity, the dashboard easily ate nearly 100 % CPU (one core) on a reasonably modern machine of mine. Another older machine I have hasn't been able to run the dashboard for months already. I suspect that #378 is also a performance issue. The beta dashboard seems to be worse in this respect than the standard one.
The dashboard needs some optimisation to stay usable on slower machines and as we scale the whole system up further in the future.
@JustAnotherArchivist is this fixed for you?
@ivan It's definitely much better thanks to #558, but I wouldn't consider this solved. If we scale up (there are still at least three big pipelines out of operation), we will run into similar problems again. I suspect the only way to really fix it is to reduce the amount of data the dashboard client has to process in the first place. This would require a change in the entire WebSocket communication to introduce a pubsub scheme where the client would subscribe to visible jobs and only receive regular stats updates for the rest.