nats-server
nats-server copied to clipboard
TLS handshake errors on nats container in K8
Observed behavior
I have a publisher publishing message to NATS server. The publisher is an Azure function publishing messages at regular intervals.
I see the errors on the publisher:-
2024-02-23T18:10:33.016Z] nats: encountered error
Traceback (most recent call last):
File "C:\Users\abcd\Code\Veli-IoT-System\message-ingestor\.venv\lib\site-packages\nats\aio\client.py", line 714, in _close
await self._transport.wait_closed()
File "C:\Users\abcd\Code\Veli-IoT-System\message-ingestor\.venv\lib\site-packages\nats\aio\transport.py", line 180, in wait_closed
return await self._io_writer.wait_closed()
File "C:\Users\abcd\AppData\Local\Programs\Python\Python39\lib\asyncio\streams.py", line 359, in wait_closed
await self._protocol._get_close_waiter(self)
File "C:\Users\abcd\AppData\Local\Programs\Python\Python39\lib\asyncio\sslproto.py", line 534, in data_received
ssldata, appdata = self._sslpipe.feed_ssldata(data)
File "C:\Users\abcd\AppData\Local\Programs\Python\Python39\lib\asyncio\sslproto.py", line 206, in feed_ssldata
self._sslobj.unwrap()
File "C:\Users\abcd\AppData\Local\Programs\Python\Python39\lib\ssl.py", line 949, in unwrap
return self._sslobj.shutdown()
ssl.SSLError: [SSL: APPLICATION_DATA_AFTER_CLOSE_NOTIFY] application data after close notify (_ssl.c:2756)
I see errors in nats(server) containers as below
[7] 2024/02/23 18:14:27.845444 [ERR] 10.240.0.10:29783 - cid:206 - TLS handshake error: read tcp 10.244.6.44:4222->10.240.0.10:29783: i/o timeout
2/23/2024, 11:44:28 PM
nats-2
d13670d74711ba3ebf50e939852eb2efdc7dbe6010bb37abd81dcf8f7f2e8542
[7] 2024/02/23 18:14:28.144152 [ERR] 10.240.0.10:29783 - cid:206 - TLS handshake timeout
[7] 2024/02/23 18:14:28.806477 [ERR] 10.240.0.12:39668 - cid:207 - TLS handshake error: read tcp 10.244.6.44:4222->10.240.0.12:39668: i/o timeout
2/23/2024, 11:44:31 PM
nats-2
d13670d74711ba3ebf50e939852eb2efdc7dbe6010bb37abd81dcf8f7f2e8542
[7] 2024/02/23 18:14:31.045227 [ERR] 10.240.0.14:17010 - cid:208 - TLS handshake error: read tcp 10.244.6.44:4222->10.240.0.14:17010: i/o timeout
Expected behavior
No TLS errors should be there.
Server and client version
NATS version - 2.10.4-alpine Client -> nats-py==2.6.0
Host environment
K8's
Steps to reproduce
TLS auth
I think you may have to bump the default tls { timeout = X }
and authorization { timeout = X }
for your deployment, have you tried tuning those?
I think you may have to bump the default
tls { timeout = X }
andauthorization { timeout = X }
for your deployment, have you tried tuning those?
@wallyqs where can I bump this in the https://github.com/nats-io/k8s/blob/main/helm/charts/nats/values.yaml helm chart.
Also I didn't find a timeout option here https://docs.nats.io/running-a-nats-service/configuration/securing_nats/authorization
@wallyqs with timeout = 50 in tls
the timeout error was removed. But I see tls handhsake errors like below
[7] 2024/03/13 19:50:46.143685 [ERR] Error trying to connect to route (attempt 1): lookup for host "nats-1.nats-headless": lookup nats-1.nats-headless on 10.0.0.10:53: no such host
[7] 2024/03/13 19:50:46.143905 [ERR] Error trying to connect to route (attempt 1): lookup for host "nats-0.nats-headless": lookup nats-0.nats-headless on 10.0.0.10:53: no such host
[7] 2024/03/13 19:50:46.172442 [ERR] Error trying to connect to route (attempt 1): lookup for host "nats-0.nats-headless": lookup nats-0.nats-headless on 10.0.0.10:53: no such host
[7] 2024/03/13 19:50:46.173120 [ERR] Error trying to connect to route (attempt 1): lookup for host "nats-1.nats-headless": lookup nats-1.nats-headless on 10.0.0.10:53: no such host
[7] 2024/03/12 18:49:37.957961 [ERR] Error trying to connect to route (attempt 1): lookup for host "nats-0.nats-headless": lookup nats-0.nats-headless on 10.0.0.10:53: no such host
[7] 2024/03/12 17:40:23.567594 [ERR] 10.240.0.4:59217 - cid:44 - TLS handshake error: read tcp 10.244.2.14:4222->10.240.0.4:59217: read: connection reset by peer
Points to note:-
- 10.0.0.10:53 is the DNS address and when I run the command
kubectl exec -it nats-0 -- nslookup nats-1.nats-headless.nats-playground-3.svc.cluster.local
it gives output
Server: 10.0.0.10
Address: 10.0.0.10:53
Name: nats-1.nats-headless.nats-playground-3.svc.cluster.local
Address: 10.244.3.198
So above is resolved to the address 10.244.3.198
from outside pod but not inside from pod.
- I have no client connected to the nats server i.e. any of the nats POD in k8
-
10.244.2.14:4222
I know this is my nats pod server ip. But I don't know whose Ip is this0.240.0.4:59217
. any idea, with whom the tcp connection is being made from bats server?
I have below nats.conf
{
"cluster": {
"name": "nats",
"no_advertise": true,
"port": 6222,
"routes": [
"nats://nats-0.nats-headless:6222",
"nats://nats-1.nats-headless:6222",
"nats://nats-2.nats-headless:6222"
]
},
"http_port": 8222,
"jetstream": {
"max_file_store": 10Gi,
"max_memory_store": 0,
"store_dir": "/data"
},
"lame_duck_duration": "30s",
"lame_duck_grace_period": "10s",
"pid_file": "/var/run/nats/nats.pid",
"port": 4222,
"server_name": $SERVER_NAME,
"tls": {
"ca_file": "/mnt/nats-certificate/rootCA-playground.crt",
"cert_file": "/mnt/nats-certificate/nats-playground-server.crt",
"key_file": "/mnt/nats-certificate/nats-playground-server.key",
"timeout": 50,
"verify": true
}
}
Here is my helm chart
config:
cluster:
enabled: true
replicas: 3
port: 6222
jetstream:
enabled: true
fileStore:
pvc:
size: 10Gi
nats:
tls:
enabled: true
merge: {
verify: true,
cert_file: '/mnt/nats-certificate/nats-playground-server.crt',
key_file: '/mnt/nats-certificate/nats-playground-server.key',
ca_file: '/mnt/nats-certificate/rootCA-playground.crt',
timeout: 50
}
podTemplate:
topologySpreadConstraints:
kubernetes.io/hostname:
maxSkew: 1
whenUnsatisfiable: DoNotSchedule
patch:
- op: add
path: /spec/volumes/-
value:
name: nats-certificate-volume
csi:
driver: secrets-store.csi.k8s.io
readOnly: true
volumeAttributes:
secretProviderClass: 'nats-certificate'
service:
merge:
spec:
type: LoadBalancer
promExporter:
enabled: true
podMonitor:
enabled: true
container:
image:
repository: nats
image: 2.10.11-alpine
patch:
- op: add
path: /volumeMounts/-
value:
name: nats-certificate-volume
mountPath: /mnt/nats-certificate
reloader:
patch:
- op: add
path: /volumeMounts/-
value:
name: nats-certificate-volume
mountPath: /mnt/nats-certificate
@neilalexander @wallyqs please check above. This is critical for me.