gateway icon indicating copy to clipboard operation
gateway copied to clipboard

Support Listener Access Logging for TLS troubleshooting

Open guydc opened this issue 1 year ago • 9 comments

Description: Envoy supports several types of Access Loggers. The access log that is most commonly in use is configured in HCM or TCP/UDP/Thrift Proxy. Envoy additionally supports a listener-level access log. The listener access log is particularly useful for:

  • Providing information related to downstream connection establishment issues, e.g. using the %DOWNSTREAM_TRANSPORT_FAILURE_REASON% operator, and following the TLS troubleshooting guide.
  • Providing information when other network-filter level access loggers are not invoked (e.g. when there is no matched filter chain, NR response flag).

Currently, Envoy Gateway uses the user-defined access log format for the listener access log, but the listener access log is only emitted when there is no matching filter chain.

https://github.com/envoyproxy/gateway/blob/0abdda7a033f0e37647177a588a904c90b783149/internal/xds/translator/accesslog.go#L216

Envoy Gateway should also support listener access logs for TLS troubleshooting. Common issues include:

  • Downstream TLS parameter mismatch: supported versions, ciphers, ...
  • Downstream Client Certificate validation issues: expiration, trust store compatibility, ...

To support the TLS troubleshooting use case, Envoy Gateway may:

  1. Make the listener access log fully configurable for end-users (including format, filters, etc.)
  2. Conditionally emit a listener access log containing the %DOWNSTREAM_TRANSPORT_FAILURE_REASON% when a TLS connection error occurs. This can be done by appending the TLS details to the user-defined/default access log format, or by emitting a different log that is fully controlled by EG and contains relevant information such as Client IP, Available TLS Context (SNI, Cert details, ...)

To filter listener logs based on a TLS failure, a CEL access log filter can be used:

filter:
extension_filter:
  name: envoy.access_loggers.extension_filters.cel
  typed_config:
    '@type': type.googleapis.com/envoy.extensions.access_loggers.filters.cel.v3.ExpressionFilter
    expression: connection.transport_failure_reason != "-"

guydc avatar Jun 12 '24 17:06 guydc

Notes from today's community meeting:

  • Prefer option (2)
  • Emit the user-defined access log
  • Users that are interested in the transport failure reason should add the relevant operator to their format

guydc avatar Jun 13 '24 18:06 guydc

Emit the user-defined access log

@guydc @arkodg does this mean EG will have a separate use-defined access log format for listener-level log? From the envoy docs, listener-specific fields seem very limited.

zhaohuabing avatar Jun 17 '24 17:06 zhaohuabing

TBH, is this a common use case? can it be covered by using EnvoyPatchPolicy (or ExtensionManager) instead of introuduce an new API?

zirain avatar Jun 17 '24 22:06 zirain

@zhaohuabing my vote would be to use the same format thats defined in envoyProxy.spec.telemetry for all access log cases, and use the default value even for the listener access log case

arkodg avatar Jun 17 '24 22:06 arkodg

is this a common use case?

I would say that observability for TLS failures, especially when client certificate auth is a feature, is important. Having said that, most proxies support this sort of troubleshooting through error logs, rather than access log.

can it be covered by using EnvoyPatchPolicy (or ExtensionManager) instead of introuduce an new API?

Yes. But, the current proposal is not to extend the API. Rather, we will emit listener access logs in more scenarios and not just NR.

guydc avatar Jun 24 '24 18:06 guydc

In light of #3688, another approach here would be to make Listener log matcher configurable. This will:

  • not change current default filters (only log for NR) or format
  • allow users to emit listener access log for TLS troubleshooting or other reasons (e.g. observability for TCP connections in general)

@arkodg , @zirain - WDYT?

guydc avatar Jun 27 '24 16:06 guydc

In light of #3688, another approach here would be to make Listener log matcher configurable. This will:

  • not change current default filters (only log for NR) or format
  • allow users to emit listener access log for TLS troubleshooting or other reasons (e.g. observability for TCP connections in general)

@arkodg , @zirain - WDYT?

sounds good

zirain avatar Jun 28 '24 00:06 zirain

In light of https://github.com/envoyproxy/gateway/pull/3688, another approach here would be to make Listener log > matcher configurable. This will:

not change current default filters (only log for NR) or format allow users to emit listener access log for TLS troubleshooting or other reasons (e.g. observability for TCP connections in general)

Should the connection.transport_failure_reason != "-" and other listener-level errors(if any) be include in the default filters?

zhaohuabing avatar Jun 29 '24 01:06 zhaohuabing

This issue has been automatically marked as stale because it has not had activity in the last 30 days.

github-actions[bot] avatar Aug 22 '24 12:08 github-actions[bot]