John Todd

Results 83 comments of John Todd

Yes, the metric is "vector_component_errors_total". Our expire_metrics_secs is set to 22265 (yes, very long because we have metric aggregations that go for >6 hours in some cases.)

Still seeing this problem on dnstap sockets, but we have not yet added the pooling on our production platform. We have vector instances that are ingesting dnstap (2 sockets on...

Still not using mulit-socket pooling here, but I did find some additional information on the problem. When vector had gone into a "locked" state and as no longer able to...

More updates now that I re-read some of my older notes: 1) "systemctl restart vector" does in fact cure the problem. 2) This is pretty clearly load-related, but I can't...

Looking more closely at behavioral data today during a lock-up interval, I find this error within 5 seconds of the dnstap sockets going into "lock-up" (vector stopped processing our "high-volume"...

Update: I think I may be describing a different problem in the last few days than the one that I originally opened here. The quantity of errors is not high...

> There might be a couple of different issues here, I will try to go through them one by one. The easiest ones to spot are the error messages coming...

We have been running the patch above for 24 hours on large locations, and have seen significant (perfect) improvement versus prior versions which were locking up on the socket after...

I am 100% sure that the queries I am generating/receiving are coming back with "; EDE: 22 (No Reachable Authority): (delegation publicbt.com)" as the result (see "dig" results.) Those exact...

(chiming in here since this is for some of the work that we need to have done, and esensar and I have talked about this concept offline) Cache deletions: would...