nats: stream not found
I'm getting this. k3s/kine/nats have been running for about 20 minutes.
{"level":"warn","ts":"2022-08-15T18:06:31.382Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc000b161c0/127.0.0.1:2379","attempt":0,"error":"rpc error: code = Unknown desc = nats: stream not found"}
docker-compose.yaml
version: '3.6'
services:
nats:
# image: nats
image: nats:2.8.2
command: >-
-js
-DV
kine:
image: rancher/kine:v0.9.3-amd64
restart: on-failure
depends_on:
- nats
ports:
- 2379:2379
command: >-
--endpoint "nats://nats"
--debug
run k3s
docker run \
-it --rm \
--tmpfs /run \
--tmpfs /var/run \
--net=host \
--privileged rancher/k3s \
server \
--cluster-init \
--disable-agent \
--token=SECRET \
--datastore-endpoint=http://127.0.0.1:2379
log
[...]
{"level":"warn","ts":"2022-08-15T18:07:15.201Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc000b161c0/127.0.0.1:2379","attempt":0,"error":"rpc error: code = Unknown desc = nats: stream not found"}
E0815 18:07:27.259776 39 resource_quota_controller.go:413] unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
W0815 18:07:27.752178 39 garbagecollector.go:747] failed to discover some groups: map[metrics.k8s.io/v1beta1:the server is currently unable to handle the request]
{"level":"warn","ts":"2022-08-15T18:07:49.495Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc000b161c0/127.0.0.1:2379","attempt":0,"error":"rpc error: code = Unknown desc = nats: stream not found"}
E0815 18:07:57.279814 39 resource_quota_controller.go:413] unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
W0815 18:07:57.763788 39 garbagecollector.go:747] failed to discover some groups: map[metrics.k8s.io/v1beta1:the server is currently unable to handle the request]
{"level":"warn","ts":"2022-08-15T18:08:23.986Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc000b161c0/127.0.0.1:2379","attempt":0,"error":"rpc error: code = Unknown desc = nats: stream not found"}
E0815 18:08:27.298772 39 resource_quota_controller.go:413] unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
W0815 18:08:27.775153 39 garbagecollector.go:747] failed to discover some groups: map[metrics.k8s.io/v1beta1:the server is currently unable to handle the request]
Do I need to create a nats stream manually?
No, I don't believe you should have to create the stream. Do you perhaps need to specify the port in addition to the container hostname in the nats:// URI? What are the logs from the kine container? Do you get the same error when testing with nats:2.7.4?
That all looks fine - it seems that kine is functioning properly. Its possible that the "stream not found" error is an improperly wrapped return code from nats? I'd have to defer to the nats team, as they contributed the code.
The unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request error is most frequently seen when the metrics-server pod has not yet started. You might investigate why metrics-server isn't working, or start k3s with --disable=metrics-server
Weird, I still get this, with --disable=metrics-server.
W0815 19:33:12.047943 38 handler_proxy.go:102] no RequestInfo found in the context
E0815 19:33:12.047976 38 controller.go:113] loading OpenAPI spec for "v1beta1.metrics.k8s.io" failed with: Error, could not get list of group versions for APIService
I0815 19:33:12.047986 38 controller.go:126] OpenAPI AggregationController: action for item v1beta1.metrics.k8s.io: Rate Limited Requeue.
W0815 19:33:12.048964 38 handler_proxy.go:102] no RequestInfo found in the context
E0815 19:33:12.049015 38 controller.go:116] loading OpenAPI spec for "v1beta1.metrics.k8s.io" failed with: failed to retrieve openAPI spec, http error: ResponseCode: 503, Body: service unavailable
, Header: map[Content-Type:[text/plain; charset=utf-8] X-Content-Type-Options:[nosniff]]
I0815 19:33:12.049037 38 controller.go:129] OpenAPI AggregationController: action for item v1beta1.metrics.k8s.io: Rate Limited Requeue.
E0815 19:33:13.437910 38 resource_quota_controller.go:413] unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
W0815 19:33:13.850875 38 garbagecollector.go:747] failed to discover some groups: map[metrics.k8s.io/v1beta1:the server is currently unable to handle the request]
It looks like it's still there. Make sure you're wiping the nats datastore, otherwise you'll have stuff left over from the previous cluster instance.
done.
It looks like I only have {"level":"warn","ts":"2022-08-15T19:46:25.050Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc000871a40/127.0.0.1:2379","attempt":0,"error":"rpc error: code = Unknown desc = nats: stream not found"} now.
Oh, I see you're starting k3s with both --cluster-init and --datastore-endpoint. The cluster-init flag is used to enable the embedded etcd, and should not be used alongside the datastore-endpoint option. I'm honestly surprised this worked at all.
If I remove --cluster-init, I still get stream not found but the k3s deploy are ready, even the metrics-server!
❯ kubectl get deploy -A
NAMESPACE NAME READY UP-TO-DATE AVAILABLE AGE
kube-system coredns 1/1 1 1 3m44s
kube-system local-path-provisioner 1/1 1 1 3m44s
kube-system metrics-server 1/1 1 1 83s
kube-system traefik 1/1 1 1 3m5s
Well you're also starting k3s with --disable-agent, which would prevent you from running any pods unless you are also adding an agent node. Did you remove that flag at the same time you removed --cluster-init, or did you add an agent node?
Yeah, it seems I removed it. I didn't intend to, but that must explain why the pods are running now.
So just to be clear, it seems to work except there's still a nats: stream not found warning.
I can't reproduce that, either using nats directly as the k3s datastore endpoint, or using a standalone kine instance. Do you perhaps have some stuff left over from a previous configuration?
I don't think so.
Note that I also tested running k3s+kine+nats on fly.io so I would guess there's no left over there, but I still get the message.
Maybe I should try to reproduce it with a docker-compose file with all 3 service, with the image tags pinned.
I am able to see the same error message using the latest k0s (v1.24.4+k0s) on a clean install.
Steps to reproduce:
- Start NATS docker container with
docker run -d -p4222:4222 --name=nats nats:2.8.2 -js -DV - Install k0s utility
$ curl -sSLf https://get.k0s.sh | sudo shas stated on https://docs.k0sproject.io/v0.12.1/install/ - Create K0s configuration
mkdir -p /etc/k0sandk0s config create > /etc/k0s/k0s.yaml - Set
storage.typetokineand setstorage.kine.datasourcetonats://<serverip>:4222 - Install single node cluster with
k0s install controller --single -c /etc/k0s/k0s.yaml - Start k0s with
k0s start
The cluster comes up successfully for me, but the logs keep repeating:
level=info msg="{\"level\":\"warn\",\"ts\":\"2022-09-21T21:33:44.148Z\",\"logger\":\"etcd-client\",\"caller\":\"v3/retry_interceptor.go:62\",\"msg\":\"retrying of unary invoker failed\",\"target\":\"etcd-endpoints://0xc00002fc00//run/k0s/kine/kine.sock:2379\",\"attempt\":0,\"error\":\"rpc error: code = Unknown desc = nats: stream not found\"}" component=kube-apiserver every 30-ish seconds. Sometimes a bit more sometimes a bit less.
I am testing on a Hetzner Cloud Machine, Rocky Linux 8.6 using Docker-CE to run NATS.
I'm not sure how to help with k0s. Can you reproduce on k3s or a standalone instance of kine ?
K0s embeds kine, so I assumed that wouldn't matter 😅 I tested it with k3s, kine and NATS again though:
- Start NATS docker container with
docker run -d -p4222:4222 --name=nats nats:2.8.2 -js -DV - Start Kine docker container with
docker run -d -p2379:2379 --name=kine rancher/kine:v0.9.4-amd64 --endpoint "nats://<serverip>:4222" --debug - Start k3s docker container with
docker run -it --rm --tmpfs /run --tmpfs /var/run --net=host --privileged rancher/k3s server --token=SECRET --datastore-endpoint=http://<serverip>:2379
Wait for the cluster to settle and the same error as with k0s will start appearing in the logs.
{"level":"warn","ts":"2022-09-22T10:33:23.705Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc000a64700/<ip>:2379","attempt":0,"error":"rpc error: code = Unknown desc = nats: stream not found"}
Test was conducted on the same host as k0s (After a full reset)
Complete Bash History
[root@nats-kube-test ~]# history
1 clear
2 dnf update
3 dnf config-manager --add-repo=https://download.docker.com/linux/centos/docker-ce.repo
4 dnf install -y docker-ce
5 systemctl enable --now docker
6 docker run -d -p4222:4222 --name=nats nats:2.8.2 -js -DV
7 docker run -d -p2379:2379 --name=kine rancher/kine:v0.9.4-amd64 --endpoint "nats://<serverip>:4222" --debug
8 docker run -it --rm --tmpfs /run --tmpfs /var/run --net=host --privileged rancher/k3s server --token=SECRET --datastore-endpoint=http://<serverip>:2379
9 history
@caleblloyd @boxboatmatt @jnmoyne can you assist on this? The error appears to be coming from the nats server. I suspect it is being configured incorrectly somehow?
I wonder if this is from the DbSize call - I noticed that call was incorrect and bundled the fix into the version upgrade:
https://github.com/k3s-io/kine/pull/137/files#diff-fefbd7dc68770cb2275bb6c283ace6fe949dc7bf28aeb56f9481a98598bd491aL896
I believe that DbSize currently does return that error
Interesting. In that case, it should be fine to ignore the error for the time being?