opencensus-go
opencensus-go copied to clipboard
stats/view: add Flush to immediately report all collected data points regardless of buffering
Problem:
When exiting a program running opencensus monitoring, some of the data are not passed to exporter. This happens because as one can see in following code, reportUsage() is called only by timer, so when program exists, all data passed to opencensus after last timer firing event are simply lost.
func (w *worker) start() {
for {
select {
case cmd := <-w.c:
cmd.handleCommand(w)
case <-w.timer.C:
w.reportUsage(time.Now())
case <-w.quit:
w.timer.Stop()
close(w.c)
w.done <- true
return
}
}
}
Solution proposal:
- Make
func (w *worker) stop()
to be (indirectly) callable by client of package. - Make
func (w *worker) start()
to callw.reportUsage()
before returning when<-w.quit
is triggered.
Thank you for filing this issue @lychung83!
So currently, (*worker).stop()
is NOT called by any function nor is case <-w.quit
triggered by any exit condition. The app just exits if exiting.
I think the idiomatic fix here would to add a global function to view called say End()
which will invoke (*worker)(defaultWorker).stop
and that should involve passing w.reportUsage
as you've suggested. However, that's going to cause a send on closed channel panic with the current stateful design of the worker.
/cc @rakyll @ramonza
I think we need a global Flush function that reports all the collected points.
@rakyll perhaps I am bikeshedding or not, on the name: I thought of Flush
too, but that name implies that continued usage after its invocation is allowed, yet we want a function that is terminally invoked, hence the recommendation of End
:)
Flush function I was suggesting is not stopping the worker, it only flushes the unprocessed data points. Exporting a function that stops the worker might need more thinking because we never planned to give control of the worker to the user entirely.
Also, in practice, should you care about what's collected after you flushed due to the termination signal?
Flush function I was suggesting is not stopping the worker, it only flushes the unprocessed data points.
Roger that, that would work. And in deed, users can invoke defer view.Flush()
before their main exits so it kills 2 birds with 1 stone.
Also, in practice, should you care about what's collected after you flushed due to the termination signal?
@lychung83 is this an actual problem that you've encountered or were you hypothesizing based off reading the code? I too don't think it should matter, and moreoever we haven't even provided a stop method nor does the worker know when we are terminating except by the Go runtime cleaning up.
To me, Flush() (not stopping the worker) seems enough. My use cases are:
- used at the end of the program to report any remaining data points in the worker.
- used for the test program to make sure all data points put by any measure.M() call to the exporter inspected by test w/o calling time.Sleep() for certain amount of time that's enough to fire worker.c
Okay, am retitling this issue then as Flush is request that is different from the original request of when a program is exiting.
This will solve https://github.com/census-ecosystem/opencensus-go-exporter-stackdriver/issues/43.
I don't think we should have a separate function to flush each Exporter and one to flush the internal stats buffer. We should have a single global Flush/Shutdown/Close function that only returns when everything has been flushed. It's also important that things get flushed in the right order: flushing the internal stats before flushing the buffer in the exporter will not work as intended. Users should not have to find this out themselves.
For metrics, concept of Reader was added recently. With reader one can call ReadAndExport to export all data on program termination. It would work same as flush.
Hi @odeke-em, I noticed one of your commits mentioned in this issue addresses this problem directly, so I'm just wondering if you'll eventually open it as a PR or if there's anything blocking it. Thank you! 🎉
Hey Marwan, Thank you for the ping! I did open up a PR but it got blocked because there was no consensus/movement to add it via OpenCensus Specs and I believe these libraries are now on auto pilot mode and no longer adding features so unfortunately I don’t think it’ll be done. My PR was sent for over a year.
On Sat, Nov 2, 2019 at 1:27 PM Marwan Sulaiman [email protected] wrote:
Hi @odeke-em https://github.com/odeke-em, I noticed one of the commits mentioned in this issue addresses this issue directly, just wondering if you'll eventually open it as a PR or if there's anything blocking it. Thank you! 🎉
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/census-instrumentation/opencensus-go/issues/862?email_source=notifications&email_token=ABFL3V72S2566O257GETAJLQRXPEDA5CNFSM4FOTOTXKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEC5EJYI#issuecomment-549078241, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABFL3V47TL2GLTABDLWOVDDQRXPEDANCNFSM4FOTOTXA .
@odeke-em Thank you for the response. This seems to me more of a bug than a feature since OpenCensus swallows metrics when a process exists.
In the same spirit, does that mean all other features and bug requests are frozen?
I recently opened this issue which, if my assumption is correct, would be quite noteworthy to address: https://github.com/census-instrumentation/opencensus-go/issues/1181
Thanks again :)