vcenter-event-broker-appliance icon indicating copy to clipboard operation
vcenter-event-broker-appliance copied to clipboard

[BUG] event router stops responding after a couple of days

Open jm66 opened this issue 2 years ago • 50 comments

Describe the bug

vmware-event-router stops working after two days. No errors registered in logs.

To Reproduce N/A

Expected behavior vmware-event-router to continuously run without interruption.

Screenshots N/A

Version (please complete the following information):

  • VEBA Form Factor: Appliance
  • VEBA Version: v0.7.5

Additional context

This behaviour is only shown in our prod instance which events volume is greater than our testing instance.

jm66 avatar May 15 '23 19:05 jm66

Howdy 🖐   jm66 ! Thank you for your interest in this project. We value your feedback and will respond soon.

github-actions[bot] avatar May 15 '23 19:05 github-actions[bot]

Hi @jm66 - When you say the vmware-event-router stops working, could you elaborate what you're seeing? Is the pod running or has it gone into a crash / restart state? Does this happen immediately after deployment or after a period of time? How large is your vCenter inventory, we'll need a bit more information to understand the issue.

Also, if you're on VEBA Slack channel, we can also chat further to diagnose the issue

lamw avatar May 15 '23 19:05 lamw

Our env is around 3-4K VMs. The pod looks healthy but no events are shown in the pod logs nor the /events resource. Tried months ago joining but no luck. Would you mind sending an invite for the slack channel to Jm.Lopez (at) utoronto.ca?

jm66 avatar May 15 '23 22:05 jm66

OK, lets take a look at the setup to see whats going on. I've added your email to invite, you should get an email to complete the signup and you can join https://vmwarecode.slack.com/archives/CQLT9B5AA

lamw avatar May 15 '23 23:05 lamw

Just joined. Thanks. Any particular channel?

jm66 avatar May 15 '23 23:05 jm66

Yes, the one I linked above (that should take you to our VEBA channel)

lamw avatar May 16 '23 00:05 lamw

I wonder if it could be related to this? https://github.com/vmware-samples/vcenter-event-broker-appliance/issues/809

embano1 avatar May 16 '23 05:05 embano1

I wonder if it could be related to this? https://github.com/vmware-samples/vcenter-event-broker-appliance/issues/809

Most likely 👍🏻

rguske avatar May 16 '23 06:05 rguske

@embano1 That's a good point! I was actually thinking it could be that but forgot I had a blog post about it.

@jm66 Could you check https://williamlam.com/2022/07/heads-up-potential-missing-vcenter-server-events-due-to-sequence-id-overflow.html and see if this is what you're observing?

lamw avatar May 16 '23 13:05 lamw

Hey @lamw I checked the post and the chainId is positive:

chainId int 310662297

Is it possible we have experienced the issue in the past and vCenter self heals?

jm66 avatar May 16 '23 16:05 jm66

vCenter doesn't "heal", but this int32 wraps over time so it could be that the underlying event collector used by the router might be stuck. Could you please try and restart the router/VEBA? Also, enabling DEBUG logging can also reveal some insights. All assuming the router is actually running correctly.

embano1 avatar May 16 '23 18:05 embano1

@embano1 yeah, actually to workaround this we cron'd this /usr/bin/kubectl rollout restart deployment vmware-event-router-vcenter -n vmware-system to run daily. So far the router hasn't got stuck.

Got it. Enabled debug and disabled the cron job.

jm66 avatar May 16 '23 20:05 jm66

@embano1 the event-router is stuck now:

kubectl get pods -n vmware-system
NAME                                          READY   STATUS    RESTARTS        AGE
cadvisor-rhdw8                                1/1     Running   9 (7d1h ago)    31d
fluent-bit-rzbd4                              1/1     Running   15 (7d1h ago)   31d
tinywww-669b487769-pr76s                      1/1     Running   6 (7d1h ago)    31d
veba-rabbit-server-0                          1/1     Running   9 (7d1h ago)    31d
veba-ui-846bb59f69-n7ln7                      1/1     Running   6 (7d1h ago)    31d
vmware-event-router-vcenter-ffb58667c-pwj9z   1/1     Running   0               2d
vmware-event-router-webhook-755654855-dw7mc   1/1     Running   0               2d

Last event UserLogoutSessionEvent with key 310792526 was processed at 2023-05-18T17:15:14.667Z.

jm66 avatar May 18 '23 20:05 jm66

Is the from the DEBUG log? Anything else in there?

embano1 avatar May 18 '23 20:05 embano1

Nothing relevant to my eyes, just the last event, but I could share the logs if you'd like.

jm66 avatar May 18 '23 20:05 jm66

2023-05-18T17:15:14.667Z   DEBUG [KNATIVE]   knative/knative.go:184  got response   {"eventID": "1f98ca23-2fe3-4376-8606-3c1c965b9a23", "response": "202: "}
2023-05-18T17:15:14.667Z   INFO  [KNATIVE]   knative/knative.go:194  successfully sent event {"eventID": "1f98ca23-2fe3-4376-8606-3c1c965b9a23"}
2023-05-18T17:15:14.667Z   INFO  [VCENTER]   vcenter/vcenter.go:343  invoking processor   {"eventID": "2971841d-73cd-48b0-a68e-bd7065b59dc9"}
2023-05-18T17:15:14.667Z   DEBUG [KNATIVE]   knative/knative.go:170  processing event  {"eventID": "2971841d-73cd-48b0-a68e-bd7065b59dc9", "event": "Context Attributes,\n  specversion: 1.0\n  type: com.vmware.event.router/event\n  source: https://vcdomain/sdk\n  subject: UserLogoutSessionEvent\n  id: 2971841d-73cd-48b0-a68e-bd7065b59dc9\n  time: 2023-05-18T17:15:13.103999Z\n  datacontenttype: application/json\nExtensions,\n  vsphereapiversion: 7.0.3.0\nData,\n  {\n    \"Key\": 310792526,\n    \"ChainId\": 310792526,\n    \"CreatedTime\": \"2023-05-18T17:15:13.103999Z\",\n    \"UserName\": \"VSKEY5\\\\vsswebservices\",\n    \"Datacenter\": null,\n    \"ComputeResource\": null,\n    \"Host\": null,\n    \"Vm\": null,\n    \"Ds\": null,\n    \"Net\": null,\n    \"Dvs\": null,\n    \"FullFormattedMessage\": \"User ****** logged out (login time: Thursday, 18 May, 2023 05:15:01 PM, number of API invocations: 4, user agent: Go-http-client/1.1)\",\n    \"ChangeTag\": \"\",\n    \"IpAddress\": \"*****\",\n    \"UserAgent\": \"Go-http-client/1.1\",\n    \"CallCount\": 4,\n    \"SessionId\": \"5219944b-3cd6-66e9-4ff8-4e662962c304\",\n    \"LoginTime\": \"2023-05-18T17:15:01.773485Z\"\n  }\n"}
2023-05-18T17:15:14.667Z   INFO  [KNATIVE]   knative/knative.go:182  sending event  {"eventID": "2971841d-73cd-48b0-a68e-bd7065b59dc9", "subject": "UserLogoutSessionEvent"}```

jm66 avatar May 19 '23 15:05 jm66

Hey @embano1 , got this after submitting kubectl rollout restart deployment vmware-event-router-vcenter -n vmware-system:

root@veba [ ~ ]# 2023-05-19T15:47:25.339Z	INFO	[MAIN]	router/main.go:212	initiating shutdown	{"commit": "73f72694", "version": "v0.7.5"}
2023-05-19T15:47:25.339Z	INFO	[VCENTER]	vcenter/vcenter.go:370	attempting graceful shutdown
2023-05-19T15:47:25.339Z	DEBUG	[KNATIVE]	knative/knative.go:184	got response	{"eventID": "2971841d-73cd-48b0-a68e-bd7065b59dc9", "response": "Post \"http://default-broker-ingress.vmware-functions.svc.cluster.local/\": context canceled"}
2023-05-19T15:47:25.339Z	ERROR	[VCENTER]	vcenter/vcenter.go:347	could not process event	{"event": "Context Attributes,\n  specversion: 1.0\n  type: com.vmware.event.router/event\n  source: https://****/sdk\n  subject: UserLogoutSessionEvent\n  id: 2971841d-73cd-48b0-a68e-bd7065b59dc9\n  time: 2023-05-18T17:15:13.103999Z\n  datacontenttype: application/json\nExtensions,\n  vsphereapiversion: 7.0.3.0\nData,\n  {\n    \"Key\": 310792526,\n    \"ChainId\": 310792526,\n    \"CreatedTime\": \"2023-05-18T17:15:13.103999Z\",\n    \"UserName\": \"********\",\n    \"Datacenter\": null,\n    \"ComputeResource\": null,\n    \"Host\": null,\n    \"Vm\": null,\n    \"Ds\": null,\n    \"Net\": null,\n    \"Dvs\": null,\n    \"FullFormattedMessage\": \"User ******* logged out (login time: Thursday, 18 May, 2023 05:15:01 PM, number of API invocations: 4, user agent: Go-http-client/1.1)\",\n    \"ChangeTag\": \"\",\n    \"IpAddress\": \"******\",\n    \"UserAgent\": \"Go-http-client/1.1\",\n    \"CallCount\": 4,\n    \"SessionId\": \"5219944b-3cd6-66e9-4ff8-4e662962c304\",\n    \"LoginTime\": \"2023-05-18T17:15:01.773485Z\"\n  }\n", "error": "knative: send event 2971841d-73cd-48b0-a68e-bd7065b59dc9: Post \"http://default-broker-ingress.vmware-functions.svc.cluster.local/\": context canceled"}
github.com/vmware-samples/vcenter-event-broker-appliance/vmware-event-router/internal/provider/vcenter.(*EventStream).processEvents
	github.com/vmware-samples/vcenter-event-broker-appliance/vmware-event-router/internal/provider/vcenter/vcenter.go:347
github.com/vmware-samples/vcenter-event-broker-appliance/vmware-event-router/internal/provider/vcenter.(*EventStream).stream
	github.com/vmware-samples/vcenter-event-broker-appliance/vmware-event-router/internal/provider/vcenter/vcenter.go:318
github.com/vmware-samples/vcenter-event-broker-appliance/vmware-event-router/internal/provider/vcenter.(*EventStream).Stream
	github.com/vmware-samples/vcenter-event-broker-appliance/vmware-event-router/internal/provider/vcenter/vcenter.go:231
main.main.func3
	github.com/vmware-samples/vcenter-event-broker-appliance/vmware-event-router/cmd/router/main.go:206
golang.org/x/sync/errgroup.(*Group).Go.func1
	golang.org/x/[email protected]/errgroup/errgroup.go:75

jm66 avatar May 19 '23 15:05 jm66

That error is fine because during a rollout the current instance is terminated and the code reacts gracefully to it (context cancelled). The current event in flight is then not correctly processed from the source (vcenter) and, depending on whether checkpointing is enabled, the code will resume from the last checkpoint or "now" when it starts.

Even before the shutdown it was processing an event so I was wondering whether the code/event stream works as expected then?

embano1 avatar May 20 '23 06:05 embano1

Hey, Unfortunately we encounter the same problem. Helm deployment: Chart: event-router-v0.7.6
App version: v0.7.5

On our test environment everything works for months already.

If I activate info mode, it just stops logging anything:

2023-06-13T06:13:05.216Z INFO [KNATIVE] knative/knative.go:194 successfully sent event {"eventID": "4ab869ed-b2e0-4f90-9fad-00a5c1d83f82"}
2023-06-13T06:13:05.941Z INFO [VCENTER] vcenter/vcenter.go:343 invoking processor {"eventID": "06d60432-b9c9-4a3e-9f04-51bc8a875d10"}
2023-06-13T06:13:05.941Z INFO [KNATIVE] knative/knative.go:182 sending event {"eventID": "06d60432-b9c9-4a3e-9f04-51bc8a875d10", "subject": "com.vmware.vc.HA.NotAllHostAddrsPingable"}
2023-06-13T06:13:05.942Z INFO [KNATIVE] knative/knative.go:194 successfully sent event {"eventID": "06d60432-b9c9-4a3e-9f04-51bc8a875d10"}

Within the debug mode we can only see those entries anymore:

2023-06-13T06:35:50.917Z DEBUG [VCENTER] vcenter/vcenter.go:313 no new events, backing off {"delaySeconds": 5}
2023-06-13T06:35:55.973Z DEBUG [VCENTER] vcenter/vcenter.go:313 no new events, backing off {"delaySeconds": 5}
2023-06-13T06:36:01.026Z DEBUG [VCENTER] vcenter/vcenter.go:313 no new events, backing off {"delaySeconds": 5}
2023-06-13T06:36:06.072Z DEBUG [VCENTER] vcenter/vcenter.go:313 no new events, backing off {"delaySeconds": 5}

a short pod deletion fixes the problem and everything is back to normal for at least a few minutes, sometimes hours...

2023-06-13T06:41:13.196Z INFO [KNATIVE] knative/knative.go:182 sending event {"eventID": "80e3ab5d-9dd3-47ef-8ad2-efe54ec86872", "subject": "com.vmware.vcIntegrity.ClusterMembershipChange"}
2023-06-13T06:41:13.197Z DEBUG [KNATIVE] knative/knative.go:184 got response {"eventID": "80e3ab5d-9dd3-47ef-8ad2-efe54ec86872", "response": "202: "}
2023-06-13T06:41:13.197Z INFO [KNATIVE] knative/knative.go:194 successfully sent event {"eventID": "80e3ab5d-9dd3-47ef-8ad2-efe54ec86872"}
2023-06-13T06:41:14.051Z DEBUG [VCENTER] vcenter/vcenter.go:313 no new events, backing off {"delaySeconds": 1}
2023-06-13T06:41:15.108Z DEBUG [VCENTER] vcenter/vcenter.go:313 no new events, backing off {"delaySeconds": 2}
2023-06-13T06:41:17.160Z DEBUG [VCENTER] vcenter/vcenter.go:313 no new events, backing off {"delaySeconds": 4}
2023-06-13T06:41:21.237Z DEBUG [VCENTER] vcenter/vcenter.go:313 no new events, backing off {"delaySeconds": 5}
2023-06-13T06:41:26.298Z DEBUG [VCENTER] vcenter/vcenter.go:313 no new events, backing off {"delaySeconds": 5}
2023-06-13T06:41:31.414Z INFO [VCENTER] vcenter/vcenter.go:343 invoking processor {"eventID": "24358daa-9903-4929-bceb-96d617560f1c"}
``´

laugrean avatar Jun 13 '23 06:06 laugrean

@embano1 would you also recommend folks here give Tanzu Sourecs for Knative a try to see if that helps?

lamw avatar Jun 16 '23 20:06 lamw

@embano1 would you also recommend folks here give Tanzu Sourecs for Knative a try to see if that helps?

You mean regarding the VC DB overflow issue? Sources won't help here bc it's using the same EventHistoryCollector mechanism as router - and this is a server-side issue.

embano1 avatar Jun 17 '23 05:06 embano1

You mean regarding the VC DB overflow issue?

No, I meant for @jm66 issue https://github.com/vmware-samples/vcenter-event-broker-appliance/issues/1054#issuecomment-1554782948 are we saying this is related to VC DB overflow?

but there's also @laugrean report which isn't clear to me if its VC DB overflow either unless you're referring to this issue?

lamw avatar Jun 17 '23 13:06 lamw

No, I meant for @jm66 issue #1054 (comment) are we saying this is related to VC DB overflow?

See my response above: https://github.com/vmware-samples/vcenter-event-broker-appliance/issues/1054#issuecomment-1555717307

The initial issue description though seems to be a deadlock in the router.

but there's also @laugrean report which isn't clear to me if its VC DB overflow either unless you're referring to this issue?

IIRC, this is also deadlock in router.

embano1 avatar Jun 17 '23 17:06 embano1

@embano1 OK, since both of the issues reported seems to be a deadlock in router ... then my initial comment about testing Tanzu Sources on VEBA would at least see if that may help with issue? If so, I'll put something quick/dirty together that'll go ahead and undeploy router and setup sources ...

lamw avatar Jun 17 '23 19:06 lamw

@embano1 OK, since both of the issues reported seems to be a deadlock in router ... then my initial comment about testing Tanzu Sources on VEBA would at least see if that may help with issue? If so, I'll put something quick/dirty together that'll go ahead and undeploy router and setup sources ...

Yup, we should definitely cross-check with Sources. There's lots of code both share, hoping it's related to event processing/invocation where code differs.

embano1 avatar Jun 17 '23 19:06 embano1

@jm66 @laugrean To summarize the next steps, we would like to try un-deploying the event-router from within VEBA and deploy the Tanzu Sources for Knative which includes vSphere as a source (similar code which was ported from VEBA to Tanzu Sources) to see if this issue is resolved.

The instructions below assumes VEBA appliance can go outbound to pull down some additional packages and a require change is needed for your function.yaml deployment as it slightly different schema in how you subscribe to an event, which I will detail further below after the setup.

Step 0 - SSH to VEBA appliance

Step 1 - Undeploy Event Router

kubectl -n vmware-system delete -f /root/config/event-router/vmware-event-router-k8s-vcenter.yaml

Step 2 - Install Tanzu Sources for Knative

kubectl apply -f https://github.com/vmware-tanzu/sources-for-knative/releases/latest/download/release.yaml

Step 3 - Install Knative CLI & vSphere Sources

curl -L https://github.com/knative/client/releases/download/knative-v1.10.0/kn-linux-amd64 -o /usr/local/bin/kn
chmod +x /usr/local/bin/kn
curl -L https://github.com/vmware-tanzu/sources-for-knative/releases/download/v0.37.0/kn-vsphere_0.37.0_Linux_x86_64.tar.gz -o /root/kn-vsphere_0.37.0_Linux_x86_64.tar.gz
tar -zxvf /root/kn-vsphere_0.37.0_Linux_x86_64.tar.gz
mv /root/kn-vsphere_0.37.0_Linux_x86_64/kn-vsphere /usr/local/bin/kn-vsphere
chmod +x /usr/local/bin/kn-vsphere
rm -rf /root/kn-vsphere_0.37.0_Linux_x86_64*

Step 4 - Export VC Creds

export VCENTER_USERNAME="FILL_ME"
export VCENTER_PASSWORD="FILL_ME"
export VCENTER_HOSTNAME="FILL_ME"

Step 5 - Create vSphere Secret

kn vsphere auth create \
    --namespace vmware-functions \
    --username ${VCENTER_USERNAME} \
    --password ${VCENTER_PASSWORD} \
    --name vcenter-creds \
    --verify-url https://${VCENTER_HOSTNAME} \
    --verify-insecure

Step 6 - Create vSphere Source

kn vsphere source create \
    --namespace vmware-functions \
    --name vcsa-source \
    --vc-address https://${VCENTER_HOSTNAME} \
    --skip-tls-verify \
    --secret-ref vcenter-creds \
    --sink-uri http://default-broker-ingress.vmware-functions.svc.cluster.local \
    --encoding json

If everything was setup successfully, you should see the following pods running:

# kubectl get pods -A
NAMESPACE            NAME                                                  READY   STATUS      RESTARTS      AGE
cert-manager         cert-manager-99bb69456-dhxcv                          1/1     Running     2 (23m ago)   89d
cert-manager         cert-manager-cainjector-ffb4747bb-8x7ck               1/1     Running     2 (23m ago)   89d
cert-manager         cert-manager-webhook-545bd5d7d8-5fxtr                 1/1     Running     2 (23m ago)   89d
contour-external     contour-685f87dc74-ccdcn                              1/1     Running     2 (23m ago)   89d
contour-external     contour-685f87dc74-q24fx                              1/1     Running     2 (23m ago)   89d
contour-external     contour-certgen-v1.22.0-42c87                         0/1     Completed   0             89d
contour-external     envoy-jqzft                                           2/2     Running     4 (23m ago)   89d
contour-internal     contour-c4478d89b-cvdk5                               1/1     Running     2 (23m ago)   89d
contour-internal     contour-c4478d89b-ws4tw                               1/1     Running     2 (23m ago)   89d
contour-internal     contour-certgen-v1.22.0-nn2r2                         0/1     Completed   0             89d
contour-internal     envoy-pkb7d                                           2/2     Running     4 (23m ago)   89d
knative-eventing     eventing-controller-fdc4dd6bb-6zttl                   1/1     Running     2 (23m ago)   89d
knative-eventing     eventing-webhook-676dfb6c4f-hmnwz                     1/1     Running     2 (23m ago)   89d
knative-eventing     rabbitmq-broker-controller-54c85d4f98-b5vrv           1/1     Running     2 (23m ago)   89d
knative-eventing     rabbitmq-broker-webhook-877b8d7df-bqd4t               1/1     Running     2 (23m ago)   89d
knative-serving      activator-7cbbfbc85-4zg95                             1/1     Running     2 (23m ago)   89d
knative-serving      autoscaler-8f986cff-jgngg                             1/1     Running     2 (23m ago)   89d
knative-serving      controller-58dfb45d74-bz88p                           1/1     Running     2 (23m ago)   89d
knative-serving      domain-mapping-5d8db49bf6-8z96x                       1/1     Running     2 (23m ago)   89d
knative-serving      domainmapping-webhook-584476fd67-cdm5s                1/1     Running     2 (23m ago)   89d
knative-serving      net-contour-controller-6768758c67-wb5jn               1/1     Running     2 (23m ago)   89d
knative-serving      webhook-6d5c55fd8c-zl5zg                              1/1     Running     2 (23m ago)   89d
kube-system          antrea-agent-9xthc                                    2/2     Running     7 (23m ago)   89d
kube-system          antrea-controller-6db8bb65cf-k8ltr                    1/1     Running     3 (22m ago)   89d
kube-system          coredns-565d847f94-dvrdf                              1/1     Running     2 (23m ago)   89d
kube-system          coredns-565d847f94-zmt46                              1/1     Running     2 (23m ago)   89d
kube-system          etcd-veba.primp-industries.local                      1/1     Running     2 (23m ago)   89d
kube-system          kube-apiserver-veba.primp-industries.local            1/1     Running     2 (23m ago)   89d
kube-system          kube-controller-manager-veba.primp-industries.local   1/1     Running     2 (23m ago)   89d
kube-system          kube-proxy-n22lb                                      1/1     Running     2 (23m ago)   89d
kube-system          kube-scheduler-veba.primp-industries.local            1/1     Running     2 (23m ago)   89d
local-path-storage   local-path-provisioner-5646477f4b-pql69               1/1     Running     2 (23m ago)   89d
rabbitmq-system      messaging-topology-operator-74c896bb55-5h66m          1/1     Running     2 (23m ago)   89d
rabbitmq-system      rabbitmq-cluster-operator-586b7547f8-7vr9x            1/1     Running     2 (23m ago)   89d
vmware-functions     default-broker-ingress-bbf774754-xjzpz                1/1     Running     2 (23m ago)   89d
vmware-functions     sockeye-79b7fc7c55-tmjqk                              1/1     Running     2 (23m ago)   89d
vmware-functions     sockeye-trigger-dispatcher-5b96bdd8d-mjth6            1/1     Running     2 (23m ago)   89d
vmware-functions     vcsa-source-adapter-7d474dbbfb-xd7ms                  1/1     Running     0             7m2s
vmware-sources       horizon-source-controller-754d465b5b-ptbmw            1/1     Running     0             17m
vmware-sources       horizon-source-webhook-6d7cbd8dd-mtz2t                1/1     Running     0             17m
vmware-sources       vsphere-source-webhook-55c56cf48-x7fpw                1/1     Running     0             17m
vmware-system        cadvisor-jt8rk                                        1/1     Running     2 (23m ago)   89d
vmware-system        tinywww-5b795ddd75-pjds5                              1/1     Running     2 (23m ago)   89d
vmware-system        veba-rabbit-server-0                                  1/1     Running     4 (21m ago)   89d
vmware-system        veba-ui-846bb59f69-rk994                              1/1     Running     2 (23m ago)   89d

Specifically, we want to make sure vSphere Sources Adapter is running as shown in example below:

kubectl -n vmware-functions get deploy/vcsa-source-adapter
NAME                  READY   UP-TO-DATE   AVAILABLE   AGE
vcsa-source-adapter   1/1     1            1           11m

We can check the logs of vSphere sources and ensure the very last last or so state it was able to login by retrieving the VC time:

kubectl -n vmware-functions logs deploy/vcsa-source-adapter
{"level":"warn","ts":"2023-06-18T14:09:14.176Z","logger":"vsphere-source-adapter","caller":"v2/config.go:197","msg":"Tracing configuration is invalid, using the no-op default{error 26 0  empty json tracing config}","commit":"8fda92a-dirty"}
{"level":"warn","ts":"2023-06-18T14:09:14.177Z","logger":"vsphere-source-adapter","caller":"v2/config.go:190","msg":"Sink timeout configuration is invalid, default to -1 (no timeout)","commit":"8fda92a-dirty"}
{"level":"info","ts":"2023-06-18T14:09:14.261Z","logger":"vsphere-source-adapter","caller":"kvstore/kvstore_cm.go:54","msg":"Initializing configMapKVStore...","commit":"8fda92a-dirty"}
{"level":"info","ts":"2023-06-18T14:09:14.271Z","logger":"vsphere-source-adapter","caller":"vsphere/adapter.go:92","msg":"configuring checkpointing","commit":"8fda92a-dirty","ReplayWindow":"5m0s","Period":"10s"}
{"level":"warn","ts":"2023-06-18T14:09:14.271Z","logger":"vsphere-source-adapter","caller":"vsphere/adapter.go:131","msg":"could not retrieve checkpoint configuration","commit":"8fda92a-dirty","error":"key checkpoint does not exist"}
{"level":"info","ts":"2023-06-18T14:09:14.274Z","logger":"vsphere-source-adapter","caller":"vsphere/adapter.go:311","msg":"no valid checkpoint found","commit":"8fda92a-dirty"}
{"level":"info","ts":"2023-06-18T14:09:14.274Z","logger":"vsphere-source-adapter","caller":"vsphere/adapter.go:312","msg":"setting begin of event stream","commit":"8fda92a-dirty","beginTimestamp":"2023-06-18 14:09:14.260972 +0000 UTC"}
{"level":"info","ts":"2023-06-18T14:14:14.261Z","logger":"vsphere-source-adapter","caller":"vsphere/client.go:115","msg":"Executing SOAP keep-alive handler","commit":"8fda92a-dirty","rpc":"keepalive"}
{"level":"info","ts":"2023-06-18T14:14:14.265Z","logger":"vsphere-source-adapter","caller":"vsphere/client.go:121","msg":"vCenter current time: 2023-06-18 14:14:14.31874 +0000 UTC","commit":"8fda92a-dirty","rpc":"keepalive"}

At this point, you should also be seeing events flow into Sockeye by opening browser to https://VEBA_FQDN/events

Lastly, to deploy or re-deploy your functions, you need to edit the function.yaml manifest and update the type field to something like: com.vmware.vsphere.<EVENTID>.v0 and remove the subject field.

....snip ....
apiVersion: eventing.knative.dev/v1
kind: Trigger
metadata:
  name: veba-ps-slack-trigger
  labels:
    app: veba-ui
spec:
  broker: default
  filter:
    attributes:
      type: com.vmware.vsphere.DrsVmPoweredOnEvent.v0
  subscriber:
    ref:
      apiVersion: serving.knative.dev/v1
      kind: Service
      name: kn-ps-slack

Let me know if you have any questions and hopefully this yields better results ...

JFYI - Tanzu Sources by default does not log all events with their default INFO logging, you'll have to enable DEBUG to do so if you wish to see vSphere events in Tanzu Sources logs but I think for now, lets see if this resolves the issue with the setup

lamw avatar Jun 18 '23 14:06 lamw

Hey,

In my szenario veba is running as a helm chart deployment on RedHat Openshift. I've multiple vcenter connected to the same broker within the same namespace.

The installation comes directly from your github repo: https://github.com/vmware-samples/vcenter-event-broker-appliance.git

Should I uninstall the helm chart in step 1?

Is your release.yaml mentioned in step 2 based on tanzu specialities or is it working on any kubernetes? Currently my helm deployment and all trigger + functions are installed in the same namespace. If I'll use it, is it sufficient to delete the namespace creation part and replace all namespaces with the one I want it to be installed?

laugrean avatar Jun 19 '23 06:06 laugrean

Should I uninstall the helm chart in step 1?

Without touching your prod environment, you could also deploy a separate VEBA instance/Kubernetes environment and just deploy the sources to a broker without additional triggers to see if the sources continue to run when you see the router stopping in the other appliance.

But you can also run sources in parallel to your existing setup by installing as described above w/out having to uninstall your router. Depending on your configured triggers, this can lead to duplicate events though. So just be careful.

embano1 avatar Jun 19 '23 06:06 embano1

Did it on my test environment. First problem: we always need Openshift Cluster Admin permissions. But here we go:

{"level":"warn","ts":"2023-06-19T07:31:42.597Z","logger":"vsphere-source-adapter","caller":"v2/config.go:197","msg":"Tracing configuration is invalid, using the no-op default{error 26 0 empty json tracing config}","commit":"8fda92a-dirty"}
{"level":"warn","ts":"2023-06-19T07:31:42.597Z","logger":"vsphere-source-adapter","caller":"v2/config.go:190","msg":"Sink timeout configuration is invalid, default to -1 (no timeout)","commit":"8fda92a-dirty"}
{"level":"info","ts":"2023-06-19T07:31:42.701Z","logger":"vsphere-source-adapter","caller":"kvstore/kvstore_cm.go:54","msg":"Initializing configMapKVStore...","commit":"8fda92a-dirty"}
{"level":"info","ts":"2023-06-19T07:31:42.715Z","logger":"vsphere-source-adapter","caller":"vsphere/adapter.go:92","msg":"configuring checkpointing","commit":"8fda92a-dirty","ReplayWindow":"5m0s","Period":"10s"}
{"level":"warn","ts":"2023-06-19T07:31:42.715Z","logger":"vsphere-source-adapter","caller":"vsphere/adapter.go:131","msg":"could not retrieve checkpoint configuration","commit":"8fda92a-dirty","error":"key checkpoint does not exist"}
{"level":"info","ts":"2023-06-19T07:31:42.725Z","logger":"vsphere-source-adapter","caller":"vsphere/adapter.go:311","msg":"no valid checkpoint found","commit":"8fda92a-dirty"}
{"level":"info","ts":"2023-06-19T07:31:42.725Z","logger":"vsphere-source-adapter","caller":"vsphere/adapter.go:312","msg":"setting begin of event stream","commit":"8fda92a-dirty","beginTimestamp":"2023-06-19 07:31:42.72101 +0000 UTC"}

It's connected, but I cannot see any events within sockeye.

laugrean avatar Jun 19 '23 07:06 laugrean

To keep things simple for now, try enabling DEBUG log level to see if it's receiving/doing something: https://github.com/vmware-tanzu/sources-for-knative#source-adapter-log-level

embano1 avatar Jun 19 '23 09:06 embano1