neon icon indicating copy to clipboard operation
neon copied to clipboard

Epic: send connection attempt information to the control plane

Open kelvich opened this issue 1 year ago • 3 comments

Motivation

Right now if we stuck in some operation (e.g. check_availability, suspend_compute, create_branch, etc) then the client will not be able to connect to that endpoint if the operation is running or stuck. Except of proxy logs we don't have the information about any connection attempts while there is in-progress operation. Having such information will help to properly prioritize what project to fix first in case of multiple stuck projects (ones with connection attempts should be fixed first) and allow to properly calculate user-visible downtime (now we include all stuck operations and we over-count).

DoD

Operations like check_availability, suspend_compute, create_branch, etc have a mark when someone unsuccessfully tried connecting while it was running.

Implementation ideas

One way to do that is to add something like connection_attempt_at nullable field to the operations table and set it in case of wake_compute attempt during this operation if the filed is null and do nothing if it is not null.

### Tasks

Other related tasks and Epics

kelvich avatar Jan 08 '24 11:01 kelvich

@khanova is planning to finish it this week

vadim2404 avatar Jan 08 '24 12:01 vadim2404

How I can find a full list of such operations?

AFAIU we cannot connect if there is any operation, should I just save timestamp for all of them?

khanova avatar Jan 08 '24 22:01 khanova

AFAIU we cannot connect if there is any operation, should I just save timestamp for all of them?

yes, just a field on operation should be okay. We do wait for some operations to complete though. E.g. with start_compute connection attempt will wait until its completion.

kelvich avatar Jan 09 '24 06:01 kelvich

Stas would like to check this and will close afterwards.

stradig avatar Jan 29 '24 12:01 stradig

should we expect the connection attempt information to be logged for most of the start_compute operations?

i am thinking about a situation when a user attempts to connect to their compute, their compute is idle, and so control plane wakes it up. shouldn't we see the connect information timing?

could you clarify the expected behavior, @kelvich @khanova

stepashka avatar Feb 05 '24 12:02 stepashka

i am thinking about a situation when a user attempts to connect to their compute, their compute is idle, and so control plane wakes it up. shouldn't we see the connect information timing?

that would be the same timestamp as start operation creation date

kelvich avatar Feb 05 '24 12:02 kelvich