kine nats: stream not found

I'm getting this. k3s/kine/nats have been running for about 20 minutes.

{"level":"warn","ts":"2022-08-15T18:06:31.382Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc000b161c0/127.0.0.1:2379","attempt":0,"error":"rpc error: code = Unknown desc = nats: stream not found"}

docker-compose.yaml

version: '3.6'
services:
  nats:
    # image: nats
    image: nats:2.8.2
    command: >-
      -js
      -DV
  kine:
    image: rancher/kine:v0.9.3-amd64
    restart: on-failure
    depends_on:
      - nats
    ports:
      - 2379:2379
    command: >-
      --endpoint "nats://nats"
      --debug

run k3s

docker run \
  -it --rm \
  --tmpfs /run \
  --tmpfs /var/run \
  --net=host \
  --privileged rancher/k3s \
  server \
  --cluster-init \
  --disable-agent \
  --token=SECRET \
  --datastore-endpoint=http://127.0.0.1:2379

log

[...]
{"level":"warn","ts":"2022-08-15T18:07:15.201Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc000b161c0/127.0.0.1:2379","attempt":0,"error":"rpc error: code = Unknown desc = nats: stream not found"}
E0815 18:07:27.259776      39 resource_quota_controller.go:413] unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
W0815 18:07:27.752178      39 garbagecollector.go:747] failed to discover some groups: map[metrics.k8s.io/v1beta1:the server is currently unable to handle the request]
{"level":"warn","ts":"2022-08-15T18:07:49.495Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc000b161c0/127.0.0.1:2379","attempt":0,"error":"rpc error: code = Unknown desc = nats: stream not found"}
E0815 18:07:57.279814      39 resource_quota_controller.go:413] unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
W0815 18:07:57.763788      39 garbagecollector.go:747] failed to discover some groups: map[metrics.k8s.io/v1beta1:the server is currently unable to handle the request]
{"level":"warn","ts":"2022-08-15T18:08:23.986Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc000b161c0/127.0.0.1:2379","attempt":0,"error":"rpc error: code = Unknown desc = nats: stream not found"}
E0815 18:08:27.298772      39 resource_quota_controller.go:413] unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
W0815 18:08:27.775153      39 garbagecollector.go:747] failed to discover some groups: map[metrics.k8s.io/v1beta1:the server is currently unable to handle the request]

Do I need to create a nats stream manually?

Aug 15 '22 18:08 bbigras

No, I don't believe you should have to create the stream. Do you perhaps need to specify the port in addition to the container hostname in the nats:// URI? What are the logs from the kine container? Do you get the same error when testing with nats:2.7.4?

Aug 15 '22 18:08 brandond

Using nats://nats:4222 seems to do the same.

Same with nats:2.7.4.

kine.log

Aug 15 '22 19:08 bbigras

That all looks fine - it seems that kine is functioning properly. Its possible that the "stream not found" error is an improperly wrapped return code from nats? I'd have to defer to the nats team, as they contributed the code.

The unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request error is most frequently seen when the metrics-server pod has not yet started. You might investigate why metrics-server isn't working, or start k3s with --disable=metrics-server

Aug 15 '22 19:08 brandond

Weird, I still get this, with --disable=metrics-server.

W0815 19:33:12.047943      38 handler_proxy.go:102] no RequestInfo found in the context
E0815 19:33:12.047976      38 controller.go:113] loading OpenAPI spec for "v1beta1.metrics.k8s.io" failed with: Error, could not get list of group versions for APIService
I0815 19:33:12.047986      38 controller.go:126] OpenAPI AggregationController: action for item v1beta1.metrics.k8s.io: Rate Limited Requeue.
W0815 19:33:12.048964      38 handler_proxy.go:102] no RequestInfo found in the context
E0815 19:33:12.049015      38 controller.go:116] loading OpenAPI spec for "v1beta1.metrics.k8s.io" failed with: failed to retrieve openAPI spec, http error: ResponseCode: 503, Body: service unavailable
, Header: map[Content-Type:[text/plain; charset=utf-8] X-Content-Type-Options:[nosniff]]
I0815 19:33:12.049037      38 controller.go:129] OpenAPI AggregationController: action for item v1beta1.metrics.k8s.io: Rate Limited Requeue.
E0815 19:33:13.437910      38 resource_quota_controller.go:413] unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
W0815 19:33:13.850875      38 garbagecollector.go:747] failed to discover some groups: map[metrics.k8s.io/v1beta1:the server is currently unable to handle the request]

Aug 15 '22 19:08 bbigras

It looks like it's still there. Make sure you're wiping the nats datastore, otherwise you'll have stuff left over from the previous cluster instance.

Aug 15 '22 19:08 brandond

done.

It looks like I only have {"level":"warn","ts":"2022-08-15T19:46:25.050Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc000871a40/127.0.0.1:2379","attempt":0,"error":"rpc error: code = Unknown desc = nats: stream not found"} now.

Aug 15 '22 19:08 bbigras

Oh, I see you're starting k3s with both --cluster-init and --datastore-endpoint. The cluster-init flag is used to enable the embedded etcd, and should not be used alongside the datastore-endpoint option. I'm honestly surprised this worked at all.

Aug 15 '22 19:08 brandond

If I remove --cluster-init, I still get stream not found but the k3s deploy are ready, even the metrics-server!

❯ kubectl get deploy -A
NAMESPACE     NAME                     READY   UP-TO-DATE   AVAILABLE   AGE
kube-system   coredns                  1/1     1            1           3m44s
kube-system   local-path-provisioner   1/1     1            1           3m44s
kube-system   metrics-server           1/1     1            1           83s
kube-system   traefik                  1/1     1            1           3m5s

Aug 15 '22 19:08 bbigras

Well you're also starting k3s with --disable-agent, which would prevent you from running any pods unless you are also adding an agent node. Did you remove that flag at the same time you removed --cluster-init, or did you add an agent node?

Aug 15 '22 19:08 brandond

Yeah, it seems I removed it. I didn't intend to, but that must explain why the pods are running now.

Aug 15 '22 20:08 bbigras

So just to be clear, it seems to work except there's still a nats: stream not found warning.

Aug 16 '22 19:08 bbigras

I can't reproduce that, either using nats directly as the k3s datastore endpoint, or using a standalone kine instance. Do you perhaps have some stuff left over from a previous configuration?

Aug 16 '22 19:08 brandond

I don't think so.

Note that I also tested running k3s+kine+nats on fly.io so I would guess there's no left over there, but I still get the message.

Maybe I should try to reproduce it with a docker-compose file with all 3 service, with the image tags pinned.

Aug 16 '22 19:08 bbigras

I am able to see the same error message using the latest k0s (v1.24.4+k0s) on a clean install.

Steps to reproduce:

Start NATS docker container with docker run -d -p4222:4222 --name=nats nats:2.8.2 -js -DV
Install k0s utility $ curl -sSLf https://get.k0s.sh | sudo sh as stated on https://docs.k0sproject.io/v0.12.1/install/
Create K0s configuration mkdir -p /etc/k0s and k0s config create > /etc/k0s/k0s.yaml
Set storage.type to kine and set storage.kine.datasource to nats://<serverip>:4222
Install single node cluster with k0s install controller --single -c /etc/k0s/k0s.yaml
Start k0s with k0s start

The cluster comes up successfully for me, but the logs keep repeating: level=info msg="{\"level\":\"warn\",\"ts\":\"2022-09-21T21:33:44.148Z\",\"logger\":\"etcd-client\",\"caller\":\"v3/retry_interceptor.go:62\",\"msg\":\"retrying of unary invoker failed\",\"target\":\"etcd-endpoints://0xc00002fc00//run/k0s/kine/kine.sock:2379\",\"attempt\":0,\"error\":\"rpc error: code = Unknown desc = nats: stream not found\"}" component=kube-apiserver every 30-ish seconds. Sometimes a bit more sometimes a bit less.

I am testing on a Hetzner Cloud Machine, Rocky Linux 8.6 using Docker-CE to run NATS.

Sep 21 '22 21:09 nilathedragon

I'm not sure how to help with k0s. Can you reproduce on k3s or a standalone instance of kine ?

Sep 21 '22 22:09 brandond

K0s embeds kine, so I assumed that wouldn't matter 😅 I tested it with k3s, kine and NATS again though:

Start NATS docker container with docker run -d -p4222:4222 --name=nats nats:2.8.2 -js -DV
Start Kine docker container with docker run -d -p2379:2379 --name=kine rancher/kine:v0.9.4-amd64 --endpoint "nats://<serverip>:4222" --debug
Start k3s docker container with docker run -it --rm --tmpfs /run --tmpfs /var/run --net=host --privileged rancher/k3s server --token=SECRET --datastore-endpoint=http://<serverip>:2379

Wait for the cluster to settle and the same error as with k0s will start appearing in the logs. {"level":"warn","ts":"2022-09-22T10:33:23.705Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc000a64700/<ip>:2379","attempt":0,"error":"rpc error: code = Unknown desc = nats: stream not found"}

Test was conducted on the same host as k0s (After a full reset)

Complete Bash History

[root@nats-kube-test ~]# history
  1  clear
  2  dnf update
  3  dnf config-manager --add-repo=https://download.docker.com/linux/centos/docker-ce.repo
  4  dnf install -y docker-ce
  5  systemctl enable --now docker
  6  docker run -d -p4222:4222 --name=nats nats:2.8.2 -js -DV
  7  docker run -d -p2379:2379 --name=kine rancher/kine:v0.9.4-amd64 --endpoint "nats://<serverip>:4222" --debug
  8  docker run -it --rm --tmpfs /run --tmpfs /var/run --net=host --privileged rancher/k3s server --token=SECRET --datastore-endpoint=http://<serverip>:2379
  9  history

Sep 22 '22 10:09 nilathedragon

@caleblloyd @boxboatmatt @jnmoyne can you assist on this? The error appears to be coming from the nats server. I suspect it is being configured incorrectly somehow?

Sep 22 '22 16:09 brandond

I wonder if this is from the DbSize call - I noticed that call was incorrect and bundled the fix into the version upgrade:

https://github.com/k3s-io/kine/pull/137/files#diff-fefbd7dc68770cb2275bb6c283ace6fe949dc7bf28aeb56f9481a98598bd491aL896

I believe that DbSize currently does return that error

Sep 22 '22 16:09 caleblloyd

Interesting. In that case, it should be fine to ignore the error for the time being?

Sep 23 '22 02:09 brandond