envoy icon indicating copy to clipboard operation
envoy copied to clipboard

Segmentation Fault after `assert failure: false. Details: attempted to add shared target SdsApi <NAME_OF_SECRET> to initialized init manager Server

Open mbana opened this issue 3 years ago • 5 comments

I've emailed [email protected] and was told to raise the issue here.

Please see the Segmentation Fault section at the later part of this email for the stack trace.

Please also see: https://github.com/envoyproxy/go-control-plane/issues/581.

Issues

I've noticed a couple of things.

  1. When we implement SecretDiscoveryServiceServer "github.com/envoyproxy/go-control-plane/envoy/service/secret/v3" sometimes the StreamSecrets does not get called back at all.
  2. when StreamSecrets is called, it is called endlessly with a new stream each time, see the log output further down in this message.

When StreamSecrets is not called, obviously, our dynamically supplied secrets are titled dynamic_warming_secrets instead of dynamic_active_secrets.

Code

Our code is public so, here are the definitions of:

  1. StreamSecrets: https://github.com/kubeshop/kusk-gateway/blob/mbana-oauth-issue-401-sds/internal/envoy/sds/sds.go#L72
  2. Where we register the SecretDiscoveryServiceServer: https://github.com/kubeshop/kusk-gateway/blob/mbana-oauth-issue-401-sds/internal/envoy/manager/envoy_config_manager.go#L183
  3. Configuring the cache etc: https://github.com/kubeshop/kusk-gateway/blob/mbana-oauth-issue-401-sds/internal/envoy/manager/envoy_config_manager.go#L56.

Log of StreamSecrets being called multiple times (the stream=&{0xc000f19c70} is different each time):

2022-08-12T09:19:30Z | sds.go:74: SecretDiscoveryServiceServer.StreamSecrets: exiting method
2022-08-12T09:19:30Z | sds.go:92: 
2022-08-12T09:19:30Z | sds.go:93: SecretDiscoveryServiceServer.StreamSecrets: calling stream.Recv - stream=&{0xc000f19380}, len(s.ClientSecrets)=2
2022-08-12T09:19:30Z | sds.go:116: SecretDiscoveryServiceServer.StreamSecrets: request.TypeUrl=type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.Secret, len(s.ClientSecrets)=2
2022-08-12T09:19:30Z | sds.go:153: SecretDiscoveryServiceServer.StreamSecrets: stream.Send(response) sent - responses=[<*>version_info:"2022-08-12T09:19:30Z"  resources:{[type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.Secret]:{name:"hmac_secret"  generic_secret:{secret:{inline_bytes:"f9eckuGEcUNxAqKT0uK8OyM2Se01ukVLPHsiSoTh2X8="}}}}  type_url:"type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.Secret" <*>version_info:"2022-08-12T09:19:30Z"  resources:{[type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.Secret]:{name:"client_secret"  generic_secret:{secret:{inline_string:"Z6MX7NreJumWLmf6unsQ5uiEUrTBxfNtqG9Vy5Kjktnvfj-_fRCBO9EU1mL1YzAJ"}}}}  type_url:"type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.Secret"]
2022-08-12T09:19:30Z | sds.go:154: 
2022-08-12T09:19:30Z | sds.go:74: SecretDiscoveryServiceServer.StreamSecrets: exiting method
2022-08-12T09:19:30Z | sds.go:92: 
2022-08-12T09:19:30Z | sds.go:93: SecretDiscoveryServiceServer.StreamSecrets: calling stream.Recv - stream=&{0xc000f19c70}, len(s.ClientSecrets)=2
2022-08-12T09:19:30Z | sds.go:116: SecretDiscoveryServiceServer.StreamSecrets: request.TypeUrl=type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.Secret, len(s.ClientSecrets)=2
2022-08-12T09:19:30Z | sds.go:153: SecretDiscoveryServiceServer.StreamSecrets: stream.Send(response) sent - responses=[<*>version_info:"2022-08-12T09:19:30Z"  resources:{[type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.Secret]:{name:"client_secret"  generic_secret:{secret:{inline_string:"Z6MX7NreJumWLmf6unsQ5uiEUrTBxfNtqG9Vy5Kjktnvfj-_fRCBO9EU1mL1YzAJ"}}}}  type_url:"type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.Secret" <*>version_info:"2022-08-12T09:19:30Z"  resources:{[type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.Secret]:{name:"hmac_secret"  generic_secret:{secret:{inline_bytes:"f9eckuGEcUNxAqKT0uK8OyM2Se01ukVLPHsiSoTh2X8="}}}}  type_url:"type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.Secret"]

What we expect when we look at /config_dump of when StreamSecrets is being called:

{
   "@type": "type.googleapis.com/envoy.admin.v3.SecretsConfigDump",
   "dynamic_active_secrets": [
    {
     "name": "client_secret",
     "version_info": "2022-08-12T09:22:18Z",
     "last_updated": "2022-08-12T09:22:18.140Z",
     "secret": {
      "@type": "type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.Secret",
      "name": "client_secret",
      "generic_secret": {
       "secret": {
        "inline_string": "[redacted]"
       }
      }
     }
    },
    {
     "name": "hmac_secret",
     "version_info": "2022-08-12T09:22:18Z",
     "last_updated": "2022-08-12T09:22:18.386Z",
     "secret": {
      "@type": "type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.Secret",
      "name": "hmac_secret",
      "generic_secret": {
       "secret": {
        "inline_bytes": "W3JlZGFjdGVkXQ=="
       }
      }
     }
    }
   ]
  }

Don't worry about the exposed the secrets, they'll be removed soon.

Logs or Debug Information

In addition to that, here are dumps taken from the admin endpoint (/config_dump) of when it is working and when it is not working, i.e., when the StreamSecrets gRPC method is not invoked on the go-control-plane and when it is. Notice how the secrets are called dynamic_warming_secrets.

  1. config_dump-broken.json: https://gist.github.com/mbana/61305292ddb9fd83e260a0125893f6ca
  2. logs-broken.log: https://gist.github.com/mbana/a4636cf5e96a035db618f7c37b8dc275
  3. config_dump-working.json: https://gist.github.com/mbana/e9b7b29ed7c1be032aca867304d86d60
  4. logs-working.log: https://gist.github.com/mbana/2881bc1873090d8d82649ba95625d789

If there's anything further I can do to help get to the cause of this issue, please let me know.


Envoy Compiled From 7af7608b022c5e3ae2ad110b4d94aa7506a643d7

Edit: I went a step further and compiled Envoy from source and made a Docker image of it:

Dockerfile

FROM docker.io/ubuntu:22.04

COPY envoy-static /usr/local/bin/envoy
COPY envoy-static /usr/bin/envoy

ENTRYPOINT ["/usr/local/bin/envoy"]

Build Steps

$ git remote -vv    
origin	[email protected]:envoyproxy/envoy.git (fetch)
origin	[email protected]:envoyproxy/envoy.git (push)
$ git rev-parse HEAD
7af7608b022c5e3ae2ad110b4d94aa7506a643d7
$ bazel/setup_clang.sh
$ echo "build --config=clang" >> user.bazelrc
$ echo "build --copt=-fno-limit-debug-info" >> user.bazelrc
$ bazel build --jobs=32 -c fastbuild envoy
$ cp bazel-bin/source/exe/envoy-static .
$ docker build --tag ttl.sh/kubeshop/envoy:24h --file ./Dockerfile .
$ docker push ttl.sh/kubeshop/envoy:24h

The image is available at ttl.sh/kubeshop/envoy:24h (docker run --rm -it ttl.sh/kubeshop/envoy:24h). Note: This image is only available for ~24 hours from the time of editing this post (2022-08-12T13:47:31+00:00 UTC).

Segmentation Fault

I noticed that Envoy crashes:

[2022-08-12 13:35:16.253][7][critical][assert] [source/common/init/manager_impl.cc:36] assert failure: false. Details: attempted to add shared target SdsApi client_secret to initialized init manager Server
[2022-08-12 13:35:16.253][7][critical][backtrace] [./source/server/backtrace.h:104] Caught Aborted, suspect faulting address 0x7
[2022-08-12 13:35:16.253][7][critical][backtrace] [./source/server/backtrace.h:91] Backtrace (use tools/stack_decode.py to get line numbers):
[2022-08-12 13:35:16.253][7][critical][backtrace] [./source/server/backtrace.h:92] Envoy version: 7af7608b022c5e3ae2ad110b4d94aa7506a643d7/1.24.0-dev/Modified/DEBUG/BoringSSL
[2022-08-12 13:35:16.272][7][critical][backtrace] [./source/server/backtrace.h:96] #0: Envoy::SignalAction::sigHandler() [0x5579a06c8b28]
[2022-08-12 13:35:16.273][7][critical][backtrace] [./source/server/backtrace.h:98] #1: [0x7f086bc6a520]
AsyncClient 0x550cbfc8ec00, stream_id_: 15115583590745907833
&stream_info_: 
  StreamInfoImpl 0x550cbfc8edc8, protocol_: 1, response_code_: null, response_code_details_: via_upstream, attempt_count_: 1, health_check_request_: 0, route_name_:   upstream_info_: 
    UpstreamInfoImpl 0x550cbfc81570, upstream_connection_id_: 0
Http2::ConnectionImpl 0x550cbf8d7dc8, max_headers_kb_: 60, max_headers_count_: 100, per_stream_buffer_limit_: 268435456, allow_metadata_: 0, stream_error_on_invalid_http_messaging_: 0, is_outbound_flood_monitored_control_frame_: 0, dispatching_: 1, raised_goaway_: 0, pending_deferred_reset_streams_.size(): 0
&protocol_constraints_: 
  ProtocolConstraints 0x550cbf8d7e48, outbound_frames_: 0, max_outbound_frames_: 10000, outbound_control_frames_: 0, max_outbound_control_frames_: 1000, consecutive_inbound_frames_with_empty_payload_: 0, max_consecutive_inbound_frames_with_empty_payload_: 1, opened_streams_: 1, inbound_priority_frames_: 0, max_inbound_priority_frames_per_stream_: 100, inbound_window_update_frames_: 4, outbound_data_frames_: 4, max_inbound_window_update_frames_per_data_frame_sent_: 10
Number of active streams: 1, current_stream_id_: 1 Dumping current stream:
stream: 
  ConnectionImpl::StreamImpl 0x550cbf50bd40, stream_id_: 1, unconsumed_bytes_: 0, read_disable_count_: 0, local_end_stream_: 0, local_end_stream_sent_: 0, remote_end_stream_: 0, data_deferred_: 1, received_noninformational_headers_: 1, pending_receive_buffer_high_watermark_called_: 0, pending_send_buffer_high_watermark_called_: 0, reset_due_to_messaging_error_: 0, cookies_:   pending_trailers_to_encode_:   null
  absl::get<ResponseHeaderMapPtr>(headers_or_trailers_):   null
Dumping corresponding downstream request for upstream stream 1:
  UpstreamRequest 0x550cbfdfcd80
[2022-08-12 13:35:16.273][7][critical][backtrace] [./source/server/backtrace.h:104] Caught Segmentation fault, suspect faulting address 0x0
[2022-08-12 13:35:16.273][7][critical][backtrace] [./source/server/backtrace.h:91] Backtrace (use tools/stack_decode.py to get line numbers):
[2022-08-12 13:35:16.273][7][critical][backtrace] [./source/server/backtrace.h:92] Envoy version: 7af7608b022c5e3ae2ad110b4d94aa7506a643d7/1.24.0-dev/Modified/DEBUG/BoringSSL
[2022-08-12 13:35:16.273][7][critical][backtrace] [./source/server/backtrace.h:96] #0: Envoy::SignalAction::sigHandler() [0x5579a06c8b28]
[2022-08-12 13:35:16.273][7][critical][backtrace] [./source/server/backtrace.h:98] #1: [0x7f086bc6a520]
[2022-08-12 13:35:16.292][7][critical][backtrace] [./source/server/backtrace.h:96] #2: Envoy::Http::ResponseDecoderWrapper::dumpState() [0x55799f4d23f8]
[2022-08-12 13:35:16.311][7][critical][backtrace] [./source/server/backtrace.h:96] #3: Envoy::Http::Http2::ClientConnectionImpl::dumpStreams() [0x55799fae8b21]
[2022-08-12 13:35:16.330][7][critical][backtrace] [./source/server/backtrace.h:96] #4: Envoy::Http::Http2::ConnectionImpl::dumpState() [0x55799fae850a]
[2022-08-12 13:35:16.348][7][critical][backtrace] [./source/server/backtrace.h:96] #5: Envoy::Event::DispatcherImpl::onFatalError() [0x5579a031cd35]
[2022-08-12 13:35:16.367][7][critical][backtrace] [./source/server/backtrace.h:96] #6: Envoy::FatalErrorHandler::callFatalErrorHandlers() [0x5579a06cf35a]
[2022-08-12 13:35:16.386][7][critical][backtrace] [./source/server/backtrace.h:96] #7: Envoy::SignalAction::sigHandler() [0x5579a06c8b8a]
[2022-08-12 13:35:16.386][7][critical][backtrace] [./source/server/backtrace.h:98] #8: [0x7f086bc6a520]
Our FatalActions triggered a fatal signal.
Segmentation fault (core dumped)

Can anyone see anything useful in this stack-trace? The assertion failing is particular interesting, but I don't know much about the Envoy code-base to tell if this is an issue or not:

[2022-08-12 13:35:16.253][7][critical][assert] [source/common/init/manager_impl.cc:36] assert failure: false. Details: attempted to add shared target SdsApi client_secret to initialized init manager Server

mbana avatar Aug 12 '22 14:08 mbana

It's not clear where and when secrets are being used. Could you provide more info on that? cc @adisuissa

kyessenov avatar Aug 12 '22 15:08 kyessenov

It's not clear where and when secrets are being used. Could you provide more info on that? cc @adisuissa

Are you referring to how I create the secrets? If so, here is the part of the code where it's being done:

  • https://github.com/kubeshop/kusk-gateway/blob/mbana-oauth-issue-401-sds/internal/envoy/sds/sds.go#L198
  • https://github.com/kubeshop/kusk-gateway/blob/mbana-oauth-issue-401-sds/internal/envoy/sds/sds.go#L229
  • https://github.com/kubeshop/kusk-gateway/blob/mbana-oauth-issue-401-sds/internal/envoy/config/envoy_configuration.go#L220: Here a snapshot with the secrets is generated to include the secrets added from the above steps.

Also, if you look at the "broken" and "working" /config_dumps you will see that we've configured the envoy.filters.http.oauth2 filter that uses these secret to populate the token_secret (really it's a client_secret in OAuth2 terms, but anyway) and the hmac_secret fields:

  1. config_dump-broken.json: https://gist.github.com/mbana/61305292ddb9fd83e260a0125893f6ca
  2. config_dump-working.json: https://gist.github.com/mbana/e9b7b29ed7c1be032aca867304d86d60

To answer your question, if you search in the above files for client_secret, you'll find that's where they are "defined".

What would be ideal is if the envoy.filters.http.oauth2 filter provided an option to specify the secrets as an inline_string or inline_bytes but unfortunately it doesn't so one has to resort to writing an secret discovery server (SDS). We can't hold the secrets in a file because the envoy.filters.http.oauth2 filter we use needs to be dynamically defined by a user.

Hope that all makes sense, if not, please let me know.

mbana avatar Aug 12 '22 16:08 mbana

https://github.com/envoyproxy/envoy/issues/22625 might be related, seems like secret subscription management in oauth2 filter is broken.

kyessenov avatar Aug 12 '22 16:08 kyessenov

Hello, @kyessenov, hope you're well.

  1. Is there an ETA for this?
  2. Are there unittests in the code base that test out the SDS and ensures that it polls/streams for secrets? If so, where would should I look for those?

mbana avatar Aug 19 '22 10:08 mbana

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.

github-actions[bot] avatar Sep 18 '22 12:09 github-actions[bot]

This issue has been automatically closed because it has not had activity in the last 37 days. If this issue is still valid, please ping a maintainer and ask them to label it as "help wanted" or "no stalebot". Thank you for your contributions.

github-actions[bot] avatar Sep 25 '22 12:09 github-actions[bot]

@kyessenov, hello, please keep this open until a fix arrives for it.

Many thanks.

mbana avatar Sep 27 '22 20:09 mbana