[NEW] Add support for OCSP stapling in TLS connections
Currently, Valkey supports TLS via OpenSSL but does not staple OCSP responses when acting as a TLS server.
What is OCSP stapling?
OCSP (Online Certificate Status Protocol) stapling is a TLS extension that allows the server to send a signed, pre-fetched OCSP response from the Certificate Authority (CA) as part of the TLS handshake. This eliminates the need for each client to contact the CA’s OCSP responder directly, improving performance, privacy, and reliability of certificate revocation checking.
Why This Matters
Performance: Avoids extra network round-trips from every client to the CA’s OCSP server, reducing connection latency.
Reliability: Clients in restricted networks or without internet access can still validate certificate revocation status.
Security/Compliance: Many security guidelines (e.g. Mozilla recommendations, PCI DSS) prefer or require revocation checking. OCSP stapling is the industry-preferred mechanism.
When compiled with OpenSSL, and OCSP configuration is enabled, We will include the OCSP response in the server hello response. We could contact the OSCP responder for every new connection request, but that would greatly increase the connection establishment latency. For this reason, Valkey should use the server timer in order to periodically refresh the OCSP response:
-
Fetch OCSP Responses: Parse the OCSP responder URL from the server certificate (AIA extension).
-
Fetch the OCSP response from the CA and cache it in memory.
-
For the TLS context: Use SSL_CTX_set_tlsext_status_cb() to register a callback that attaches the cached OCSP response to TLS handshakes.
Automated Refresh Logic:
After fetching a response, parse thisUpdate and nextUpdate. Compute a refresh time as (nextUpdate - refresh_margin).
An important point is the refresh_margin. Each responder response has a "Time-To_live" (TTL = nextUpdate - thisUpdate) set into it. This means that in order to maintain a persistent caching of the OCSP response we need to make sure to start the next query before the existing cache is invalid. As we have no control over the validity window, we can provide a configuration which will control the window percentage of time (from the nextUpdate) to perform the next query. For example, say we suggest the following configuration:
# Refresh 10% before expiry (scaled to validity window)
tls-ocsp-refresh-margin 10%
then whenever complete caching an OCSP response we will schedule the next query to the OCSP responder as: nextUpdate - ((nextUpdate - thisUpdate) * 0.1)).
Fallback and Error Handling:
If OCSP response fetching fails, continue using the last valid response until it expires. Retry refresh attempts after a short interval (e.g. 60 seconds but we can also provide a configuration for that).
Suggested configurations:
# Enable OCSP stapling
tls-ocsp-stapling no (default is no)
# Refresh 10% before expiry (scaled to validity window)
tls-ocsp-refresh-margin 10%
# Retry rate after OCSP request fail (in seconds)
tls-ocsp-retry-delay 10
So, I think OCSP stapling is the wrong thing to do for databases as a whole (The clients are the right place to be doing this validation). I have no strong objection to adding it though if there is a strong need for it.
So, I think OCSP stapling is the wrong thing to do for databases as a whole (The clients are the right place to be doing this validation). I have no strong objection to adding it though if there is a strong need for it.
I agree that clients should be capable of doing that. However the reality I observed during my work at AWS, is that some cases users operate their client in a very excluded network partition and have limited access to communicate with the responder URL.
One of the reasons I created this issue, is to gather inputs about need from the community. we can wait to see if there is interest provided and close this as "not planned" after a while, but I know there is SOME demand for it.