neon icon indicating copy to clipboard operation
neon copied to clipboard

Drop role sometimes fails in compute_ctl

Open save-buffer opened this issue 1 year ago • 3 comments

Steps to reproduce

Needs investigation, but somehow reassignment didn't work in this incident https://neondb.slack.com/archives/C07BB3NHUUX/p1720584945416209

Expected result

Compute startup should never fail due to bad role drop

Actual result

Sometimes it does

Environment

Logs, links

save-buffer avatar Jul 10 '24 20:07 save-buffer

I feel a quick fix is to allow the compute startup process to produce a warning in this case instead of erroring so that at least users can start their compute.

skyzh avatar Jul 15 '24 14:07 skyzh

Yes in principle I also agree, but since we have some synchronization between cplane and compute, if we ignore these errors we can end up in an inconsistent state. We should probably just make the compute be the source of truth, and have some reconciliation back into the cplane if something fails. Seems like a fairly large-scope project

save-buffer avatar Jul 15 '24 17:07 save-buffer

There's also the risk that if it generates warnings instead of causing an incident, we'll be lazy and just never fix it. Or put more diplomatically, it won't be "high priority" and we'll constantly have other, higher-priority things to fix. So not sure what the right call is

save-buffer avatar Jul 15 '24 17:07 save-buffer

Without the context it's hard to tell, but this is likely the duplicate of https://github.com/neondatabase/cloud/issues/13582

ololobus avatar Jul 17 '24 15:07 ololobus

Ah nice, yes seems like a duplicate

save-buffer avatar Jul 17 '24 15:07 save-buffer

Closing as a duplicate of https://github.com/neondatabase/cloud/issues/13582

ololobus avatar Aug 16 '24 12:08 ololobus