envoy icon indicating copy to clipboard operation
envoy copied to clipboard

[WIP] listener: support update socket_options

Open soulxu opened this issue 3 years ago • 3 comments

Signed-off-by: He Jie Xu [email protected]

Commit Message: listener: support update socket_options Additional Description: Support to update socket_options through cloning the existing socket or creating a new socket. A new field clone_socket_when_options_updated add to bootstrap config to control that. Also only the STATE_LISTENING state option can be updated when using the cloning.

Risk Level: high Testing: TODO Docs Changes: API doc Release Notes: TODO new feature item Platform Specific Features: n/a Fixes #22634

soulxu avatar Sep 21 '22 01:09 soulxu

As a reminder, PRs marked as draft will not be automatically assigned reviewers, or be handled by maintainer-oncall triage.

Please mark your PR as ready when you want it to be reviewed!

:cat:

Caused by: https://github.com/envoyproxy/envoy/pull/23189 was opened by soulxu.

see: more, trace.

CC @envoyproxy/api-shepherds: Your approval is needed for changes made to (api/envoy/|docs/root/api-docs/). envoyproxy/api-shepherds assignee is @mattklein123 CC @envoyproxy/api-watchers: FYI only for changes made to (api/envoy/|docs/root/api-docs/).

:cat:

Caused by: https://github.com/envoyproxy/envoy/pull/23189 was opened by soulxu.

see: more, trace.

@mattklein123 @ggreenway would you like to give a high-level review on this? I didn't add any tests yet but I did a manual test and passed all existing tests. And there are some checks that should be added for rejecting the update non-STATE_LISTENING socket options when using clone.

@kfaseela if you are interested, you can have a test, and ensure this is what are you looking for.

thanks in advance!

soulxu avatar Sep 21 '22 01:09 soulxu

@mattklein123 we have three listener fields for three specific socket options, they are transparent, freebind and tcp_fast_open_queue_length, I think they should work with an update. Do we want to support updating those fields if they work? If we reject that, but people still can do a workaround by setting those options in the socket_options manually, then they got the ability to update them.

and I will have another PR to reject those option update. https://github.com/envoyproxy/envoy/pull/23259

soulxu avatar Sep 27 '22 03:09 soulxu

@mattklein123 we have three listener fields for three specific socket options, they are transparent, freebind and tcp_fast_open_queue_length, I think they should work with an update. Do we want to support updating those fields if they work? If we reject that, but people still can do a workaround by setting those options in the socket_options manually, then they got the ability to update them.

feel freebind's update is meanless, if people change its value, then the address should be changed also.

soulxu avatar Sep 27 '22 11:09 soulxu

/wait

the duplicate address check isn't right, working on a fix

soulxu avatar Sep 27 '22 11:09 soulxu

@mattklein123 we have three listener fields for three specific socket options, they are transparent, freebind and tcp_fast_open_queue_length, I think they should work with an update. Do we want to support updating those fields if they work? If we reject that, but people still can do a workaround by setting those options in the socket_options manually, then they got the ability to update them.

I would support update if possible so it's not surprising. cc @ggreenway

mattklein123 avatar Sep 27 '22 14:09 mattklein123

/wait

still have test need to fix

soulxu avatar Sep 29 '22 09:09 soulxu

This approach LGTM.

Will this also work for hot-restart? I think socket options changes aren't applied for hot-restart either.

Yes, we doesn't change socket options when hot-restart, the existing socket option already applied on the socket which passed from the parent, the OS suppose ensure there is no change.

Also checked, although we have the same address between new listener and drainning listener, the hot restart always get the socket from the active listener https://github.com/envoyproxy/envoy/blob/ad186c448cf8c5e78454d2adf5f228e9c90a997c/source/server/hot_restarting_parent.cc#L110

ListenerManager::listeners() return active listener default. So hotrestart won't get old socket with old options from the parent.

soulxu avatar Oct 04 '22 00:10 soulxu

Thanks this is great. Flushing out some more comments. For all the new tests, please try to run them many times to look for flakes. https://github.com/envoyproxy/envoy/tree/main/test/integration#reproducing-test-flakes

/wait

thanks for this good call! I saw a segmentation fault in the end of UpdateListenerWithDifferentSocketOptionsWhenReusePortDisabled, let me dig into it.

soulxu avatar Oct 04 '22 03:10 soulxu

Yes, we doesn't change socket options when hot-restart, the existing socket option already applied on the socket which passed from the parent, the OS suppose ensure there is no change.

I meant will it work if the socket options have changed in the listener configuration between the parent and the child process in hot-restart? It should probably have the same semantics as an LDS update; logically it's the same change, even though the mechanism is a bit different.

ggreenway avatar Oct 04 '22 15:10 ggreenway

Yes, we doesn't change socket options when hot-restart, the existing socket option already applied on the socket which passed from the parent, the OS suppose ensure there is no change.

I meant will it work if the socket options have changed in the listener configuration between the parent and the child process in hot-restart? It should probably have the same semantics as an LDS update; logically it's the same change, even though the mechanism is a bit different.

I see now, thanks for explaining to me again! I assume people don't change the config when hot-restart at the beginning since some experience told me it will be easy to have separate steps to do the upgrade, then things become easy. But after second thought, seems like everything should be fine except the config related to socket for envoy hot restart

For this case, I don't think it works perfectly, I can imagine two cases don't work well.

  • if the user removed a socket option after hot-restart, that setup won't be clear after duplicated from the parent. And there is no way to clear the setup since we don't know the platform default value for each socket option.
  • prebind socket option can't ensure works as expected, since the code will apply the prebind socket option again, but the socket is already bound since it is duplicated from the parent. https://github.com/envoyproxy/envoy/blob/a3b59dd87abfe6cc65b3f0af31dda1c42b725796/source/common/network/listen_socket_impl.h#L83-L87 setListenSocketOptions will set the prebind socket options

The simple solution here is to suggest the user don't change the socket option when hot-restart, the user can do another update for socket option after hot-restart. Also the user only can append new socket option for STATE_LISTENING when hot-restart. We can document those somewhere. Does make sense?

soulxu avatar Oct 05 '22 00:10 soulxu

/wait

soulxu avatar Oct 06 '22 04:10 soulxu

The simple solution here is to suggest the user don't change the socket option when hot-restart, the user can do another update for socket option after hot-restart. Also the user only can append new socket option for STATE_LISTENING when hot-restart. We can document those somewhere. Does make sense?

Another option would be, in the hot-restart protocol, for the parent to send the current socket options, and if they're different from the child socket options, invoke the same code path you're writing in this PR to update them. But right now, the child doesn't know the current socket options, so this can't be done without adding that additional information to hot-restart.

ggreenway avatar Oct 06 '22 16:10 ggreenway

The simple solution here is to suggest the user don't change the socket option when hot-restart, the user can do another update for socket option after hot-restart. Also the user only can append new socket option for STATE_LISTENING when hot-restart. We can document those somewhere. Does make sense?

Another option would be, in the hot-restart protocol, for the parent to send the current socket options, and if they're different from the child socket options, invoke the same code path you're writing in this PR to update them. But right now, the child doesn't know the current socket options, so this can't be done without adding that additional information to hot-restart.

Yea, we probably need a separate hot-restart API to get the parent's socket options config. And we need to deal with the old envoy without that new hotrestart API. I a little prefer the simple solution here since I'm not sure if it is worth or not. I think most of the time people doing the hot-restart is for updating the envoy to a new version (correct me if this isn't the major requirement for hot-restart), they needn't mix the config update with the hot restart. @mattklein123 WDYT? do you have preference on which solution?

soulxu avatar Oct 07 '22 00:10 soulxu

Personally I would probably try to keep it simple until someone actively wants this to work for hot restart. If we do want to implement it, passing the old socket options seems like a fine approach but it would be a bunch of work.

mattklein123 avatar Oct 11 '22 15:10 mattklein123

Personally I would probably try to keep it simple until someone actively wants this to work for hot restart. If we do want to implement it, passing the old socket options seems like a fine approach but it would be a bunch of work.

I did a check the hot restart api, we probably need to add a new parameter to this hot restart API https://github.com/envoyproxy/envoy/blob/f3c6c6d250b2387121e113f3ae5c47b81312e05a/source/server/hot_restarting_child.h#L17

The new parameter is used to pass the new socket_options config to the parent. Then the parent will find a socket with the same address and same socket_options, otherwise consider no socket can pass to the child. Then the child found there is invalid fd returned from the parent, then it will create new socket https://github.com/envoyproxy/envoy/blob/f3c6c6d250b2387121e113f3ae5c47b81312e05a/source/server/listener_manager_impl.cc#L248-L249

new parameter should be a compatible API change for the hot restart API, the old envoy just ignore that parameter.

soulxu avatar Oct 12 '22 06:10 soulxu

Yeah I agree ^ will work fine. I just wouldn't implement it until someone actually asks for the support.

mattklein123 avatar Oct 12 '22 14:10 mattklein123

Yeah I agree ^ will work fine. I just wouldn't implement it until someone actually asks for the support.

got it, thanks for the feedback! let me add some doc for the hotrestart.

soulxu avatar Oct 13 '22 01:10 soulxu

Thanks LGTM with small comment. I would suggest we hold this until after the release. @ggreenway is this OK with you regarding hot restart? Any other comments?

/wait

agree with that, it isn't worth rush before release

soulxu avatar Oct 15 '22 00:10 soulxu

/wait

waiting for 1.24 release, then this PR is ready for merge

soulxu avatar Oct 15 '22 01:10 soulxu

/retest

soulxu avatar Oct 15 '22 07:10 soulxu

Retrying Azure Pipelines: Retried failed jobs in: envoy-presubmit

:cat:

Caused by: a https://github.com/envoyproxy/envoy/pull/23189#issuecomment-1279683095 was created by @soulxu.

see: more, trace.

Yeah, hot-restart should be a separate PR if/when someone wants it. I agree with merging this post-release.

ggreenway avatar Oct 17 '22 15:10 ggreenway

:scream_cat: Error while processing event:

internal error: tracking error
error: Get https://storage.googleapis.com/repokitteh-tracking-prod/soulxu%25252Fenvoy%25252Fupdate_socket_opt_2: oauth2: cannot fetch token: 503 Service Unavailable
Response: {
  "error": {
    "code": 503,
    "message": "The service is currently unavailable.",
    "status": "UNAVAILABLE"
  }
}

:cat:

Caused by: a https://github.com/envoyproxy/envoy/pull/23189#pullrequestreview-1149429063 was submitted by @mattklein123.

see: more, trace.

@mattklein123 thanks!

soulxu avatar Oct 20 '22 23:10 soulxu

@ggreenway if no further comments can you approve/merge?

mattklein123 avatar Oct 20 '22 23:10 mattklein123

/retest

soulxu avatar Oct 21 '22 03:10 soulxu

Retrying Azure Pipelines: Retried failed jobs in: envoy-presubmit

:cat:

Caused by: a https://github.com/envoyproxy/envoy/pull/23189#issuecomment-1286404873 was created by @soulxu.

see: more, trace.

@soulxu seems like this needs merge to resolve conflict

@ggreenway hope you are feeling better, can you take a look when you have a chance? Thank you!

/wait

KBaichoo avatar Nov 03 '22 18:11 KBaichoo