dnceng icon indicating copy to clipboard operation
dnceng copied to clipboard

Deploy templates for unmonitored queues

Open engyebrahim opened this issue 11 months ago • 10 comments

https://dev.azure.com/dnceng/internal/_workitems/edit/7373

Release Note Category

  • [ ] Feature changes/additions
  • [ ] Bug fixes
  • [x] Internal Infrastructure Improvements

Release Note Description

Keep scale set templates up-to-date for all available VM-based queues.

engyebrahim avatar Feb 03 '25 17:02 engyebrahim

resetting this to Dev since change blocked CI and is not rolling out but we have the starting point in Enji's prior PR. not sure who should finish the work up b/c Enji is OOF and we're a bit low on people. @ilyas1974 this seems like a very useful improvement for future rollouts. should we add it to the Ops backlog and mark it as P1 or something❓

dougbu avatar Mar 19 '25 00:03 dougbu

Into the ops backlog is fine. Here is the guidance we put together for prioritization issues - https://dev.azure.com/dnceng/internal/_wiki/wikis/DNCEng%20Services%20Wiki/1225/Issue-Triage-Guidance

ilyas1974 avatar Mar 19 '25 21:03 ilyas1974

triage doc doesn't mention rollout at all, let alone engineering impact of broken rollouts. I chose P2 based on "Higher impact non-customer-facing service issues" bullet there. of course there may also be a customer impact as well in the future if we don't fix this

dougbu avatar Mar 19 '25 21:03 dougbu

hoping @engyebrahim can get back to this since it is a rollout gotcha and time-waster

dougbu avatar Apr 02 '25 01:04 dougbu

hmm, I didn't mean to self-assign this one

dougbu avatar Apr 11 '25 19:04 dougbu

@engyebrahim you said you were looking at this recently. any progress to report❓ should the issue be assigned to you❓

dougbu avatar Apr 11 '25 19:04 dougbu

@dougbu I'm working still on it, i know now the area in code that causes the problem but still working on how to fix it

engyebrahim avatar Apr 11 '25 21:04 engyebrahim

@dougbu I'm working still on it, i know now the area in code that causes the problem but still working on how to fix it

I think that means this issue should be assigned to you. please remove your assignment if this is incorrect from your perspetive

dougbu avatar Apr 11 '25 21:04 dougbu

fix in !48927

dougbu avatar May 01 '25 22:05 dougbu

I need to drop this one. current branch contains an approximate fix but

  1. it updates existing entries in the global table far too frequently. that should only happen when building the 'production' branch and an entry is found w/ the correct definition id. everywhere else, should instead add new entries
  2. it repeats the search for deployed queues in multiple subscriptions. since queues reference images in their own subscriptions (after things are copied around IIRC), it should instead do that only for the HelixStaging sub. that's where everything in the two image-tracking tables find images to reuse

dougbu avatar May 13 '25 18:05 dougbu

We're not going to do this work. Closing

missymessa avatar Oct 02 '25 18:10 missymessa