coder icon indicating copy to clipboard operation
coder copied to clipboard

chore: remove stats batcher and workspace usage tracker

Open f0ssel opened this issue 1 year ago • 7 comments

It's become clear that only workspacestats.ReportAgentStat() is the only caller of StatsBatcher.Add(). Using the stats batcher saves us a DB call, but only 1 call is saved and many are still called outside of the batcher for reporting a stat. I don't think the StatsBatcher caching abstraction is serving us anymore, so I think it should be removed in favor of writing directly to the DB when called.

f0ssel avatar May 29 '24 15:05 f0ssel

I was under the impression that batchstats buffers up to a certain amount of updates, so let's say 100 workspaces all report their agent stats at the same time, all 100 stats would be written in one go. Is this not the case? (Please note, I'm not making a case for or against the batcher here, just wondering if the premise is correct?)

mafredri avatar May 29 '24 18:05 mafredri

I was under the impression that batchstats buffers up to a certain amount of updates, so let's say 100 workspaces all report their agent stats at the same time, all 100 stats would be written in one go. Is this not the case? (Please note, I'm not making a case for or against the batcher here, just wondering if the premise is correct?)

Yep, up to 1024 stats updates.

johnstcn avatar May 29 '24 19:05 johnstcn

The batcher is totally valid, but my concern is that we are only saving 1 db call when reporting the stat already takes 7-8 db calls to get to that point, and continues to use the DB afterwards too. In return we lose the ability to return an error on failed writes, and a good bit of code to manage this buffer.

f0ssel avatar May 29 '24 21:05 f0ssel

Going to continue removing some batchers on this branch and going to scaletest the impact to see if these changes are safe to merge

f0ssel avatar May 30 '24 14:05 f0ssel

The batcher is totally valid, but my concern is that we are only saving 1 db call when reporting the stat already takes 7-8 db calls to get to that point, and continues to use the DB afterwards too. In return we lose the ability to return an error on failed writes, and a good bit of code to manage this buffer.

I see, that's fair reasoning. I believe writing to this table (agent stats) used to be more expensive as we kept 6 months of data in it. Now we only keep about 1 day.

Edit: Still, there's a big difference between read and write queries. I assume most of those 8 DB calls are read-only?

mafredri avatar May 30 '24 17:05 mafredri

@mafredri Thanks for bringing this up, I do share your concerns since it's a write operation, so I've went ahead and made an alternate PR (https://github.com/coder/coder/pull/13418) where we keep these structures in place under the workspacestats.Reporter. I'd also like your perspective on scale testing: If we do a scale test and find that these batchers are not showing any impact (that we measure), would you still want to keep them for posterity or would you be in favor of removing them?

f0ssel avatar May 30 '24 18:05 f0ssel

@mafredri Thanks for bringing this up, I do share your concerns since it's a write operation, so I've went ahead and made an alternate PR (#13418) where we keep these structures in place under the workspacestats.Reporter.

👍🏻

I'd also like your perspective on scale testing: If we do a scale test and find that these batchers are not showing any impact (that we measure), would you still want to keep them for posterity or would you be in favor of removing them?

If we find that they're not beneficial, I'm fine with removing them entirely. We can always dig in the git history if we realize we need them back.

mafredri avatar May 30 '24 20:05 mafredri

Thanks for this data, glad to close and happy to know these are working as intended!

f0ssel avatar May 31 '24 16:05 f0ssel