memray icon indicating copy to clipboard operation
memray copied to clipboard

Doesn't follow forks in BitBake

Open rossburton opened this issue 4 years ago • 5 comments

BitBake's primary process spawns off a number of worker processes which do the actual work, but these don't appear to be tracked by memray at all.

These workers are started with subprocess.Popen():

https://git.openembedded.org/bitbake/tree/lib/bb/runqueue.py#n1253

Is memray unable to follow these forks?

rossburton avatar Apr 21 '22 11:04 rossburton

Can you confirm that you are using the --follow-fork option of memray run? Check out the docs:

https://bloomberg.github.io/memray/run.html#tracking-across-forks

pablogsal avatar Apr 21 '22 11:04 pablogsal

Notice that subprocess.Popen will fork + execv and the second will just reset all the memory layout of the forked program so tracking will be disabled. memray can follow across fork calls, but there is nothing we can do acrossexecv calls as these just literally just start a new process.

pablogsal avatar Apr 21 '22 11:04 pablogsal

I was using --follow-fork, but yeah if it can't follow across execv then that's a problem.

rossburton avatar Apr 21 '22 12:04 rossburton

Yeah, at a very fundamental level, once an execv happens the new process image can be anything; it's not even necessarily a Python interpreter. We're not trying to be a general purpose memory profiler, but a Python specific one, and a lot of our setup is geared around the knowledge that what is running is a Python interpreter.

It does look as though bitblake-worker is a Python script, though, and it does seem to be run using the same interpreter as bitblake itself, meaning that Memray is installed and could be used on it, in theory, if you could just convince it to run.

I don't see a very good way to accomplish that, though. The easiest way might be to edit the bitblake-worker script itself to use the Tracking API (once I sit down and write some documentation for the API - sorry!), or to write a wrapper script around bitblake-worker that does memray run /path/to/the/real/bitblake-worker - that's ugly and I'd certainly prefer not to need hacks like that, but it should work if you do need transparency into what bitblake-worker is doing.

godlygeek avatar Apr 21 '22 20:04 godlygeek

FWIW, you can see the API usage in https://github.com/bloomberg/memray/blob/main/tests/integration/test_api.py#L12 and do things using that, if you want to get to that before we get the docs written down and published! :)

pradyunsg avatar Apr 21 '22 20:04 pradyunsg

Revisiting this a few months on: the Tracker API that @pradyunsg references is now properly documented and supported. There's no sane way for us to fork across exec calls, and no one else has requested that as a feature since this request came in.

I'd suggest that the easiest way to get visibility into what the workers are doing is to modify the bitblake-worker script to install a memray tracker with

import os

import memray

with memray.Tracker(f"output.bin.{os.getpid()}"):
    ...

That's a bit of manual work, but it should get you the information you need, without any invasive changes to the way that we do tracking.

godlygeek avatar Aug 16 '22 18:08 godlygeek