cloud-profiler-nodejs icon indicating copy to clipboard operation
cloud-profiler-nodejs copied to clipboard

Program with profiler enabled may not exit for up to a hour unless CTRL-C'ed

Open nolanmar511 opened this issue 8 years ago • 8 comments

The request in createProfile() can hang for up to an hour. So, when a user's code finishes running, it could be up to an hour before profiler exits (and node.js can stop running).

nolanmar511 avatar Oct 26 '17 22:10 nolanmar511

Copied from aalexand's comment on PR #16:

There was a question on how CreateProfile call (which is hanging) can be cancelled, as I think unless it is cancelled the Node.js program being profiled won't exit? Looking at the code of google-cloud-node/common, some thoughts:

  • makeAuthenticatedRequestFactory in util.js does seem to support cancellation via abort() on the active request - see here.
  • Unfortunately in the service.js file the return value seems to be dropped as there isn't a return statement. I don't know why.
  • It is also dropped down the chain here and here.
  • Also, all methods on a service object are wrapped with util.promisifyAll, here. The wrapper only returns a promise if no callbacks are passed though - if there are callbacks passed, it returns the original value.

So, I would naively expect that supporting cancellation of CreateProfile call is as easy as:

  • Change the code of service.js and service-object.js to return the object with the abort() call for non-streaming requests.
  • Change the profiler agent call to pass callbacks for the CreateProfile's request() call to promisify the call manually to gain the control over abort() call and call that on program exit.
  • Not sure how to intercept into the program exit. There is process.on('exit', ...) but not sure if there is a chicken-and-egg issue with it.

nolanmar511 avatar Nov 06 '17 19:11 nolanmar511

Hey folks, long time no see 👋 I'm now on the other side of the screen and my new company is trying to use Cloud Profiler to better understand our memory consumption.

Unfortunately now we have a P99 HTTP-client latency of 1H in all our tracing, which throws off our app monitoring. Which makes this a pretty serious show-stopper.

Is it possible to address at all? It's a shame it's been open for 4+ years at P2 :(

Temikus avatar Aug 03 '21 01:08 Temikus

/CC @JustinBeckwith - hey Justin, sorry to bother you but is it possible to prioritise?

Temikus avatar Aug 03 '21 01:08 Temikus

Alright, another try. @fhinkel ?

Temikus avatar Aug 04 '21 23:08 Temikus

Heh, hey @Temikus :) This repository is entirely owned by @nolanmar511 and the profiler team. Will ping internally.

JustinBeckwith avatar Aug 04 '21 23:08 JustinBeckwith

Legend! Thanks ❤️

Temikus avatar Aug 04 '21 23:08 Temikus

Sorry folks, but we’ll be switching away from GCP profiler due to our metrics being skewed by this bug and Datadog offering a much better profiler experience.

You can deprioritise this back if you prefer.

Temikus avatar Aug 20 '21 03:08 Temikus

Coming here to echo what @Temikus has said. Kind of unacceptable this hasn't been solved yet because it messes with all p99 data.

CC @nolanmar511

ckdarby avatar Nov 23 '21 23:11 ckdarby