pprofile New method to get profiler data as python object

Hey!

I needed a way to access the profiler data programmatically; to do this I've implemented the get_stats method, and any other changes there are to reduce code-duplication.

The next steps I'd like to work on would be:

Rework annotate' to useget_stats`
Add any missing callgrind' data to the object returned byget_stats, and reworkcallgrindmethod to also useget_stats`

To see how this new method is used, please check out the pydata' branch. In iPython, you can basically perform things likels,cat, andgrepthe files that the profiler picked up, there's a lot of room for awesome features, but I've only been working on what I need most. Here's a quick demo using a sample file calledsample.py`:

import time
import os
import pdb

def _sleep(t):
    time.sleep(t / 1000000.0)
    return t

def aaa(a):
    return a

def bbb(b):
    return _sleep(b)

def ccc(c):
    if not os.path.exists('/proc/%d' % c): c ^= c
    return c

print('test')

def foo(a, b, c):
    r = 0

    for i in range(a):
        for j in range(b):
            for k in range(c):
                r += sum([
                    aaa(a), bbb(b),
                    ccc(c)
                ])

    return r

def bar(i=3):
    return foo(i, 20, 100)

And now, the profiler:

In [31]: import sample
In [33]: fn = (sample.bar, 8)
In [34]: s = pprofiler.Profiler.profile('test@alpha', *fn) # profile sample.bar(8)
In [35]: p = pprofiler.Profiler(s)

# Utility function for color-printing python objects as JSON
In [39]: from pprofile.pprofiler import jprint
In [40]: jprint(p.overhead())
{
  "actual": 18.111714124679565,
  "overhead": 0.0001728534698486328,
  "total": 18.111886978149414
}

# List files, ordered by cumulative time, optionally takes an argument of the number of seconds to match or exceed (reduce output towards the worst-case files) 
In [41]: p.ls()
  0 |   17.3 | /tmp/sample.py
  1 |    0.8 | /opt/lib/python2.7/genericpath.py
  2 |    0.0 | /tmp/pprofile/pprofiler.py

# Catenate the file (using it's path, or index according to p.ls, show time/hit score and line numbers.
In [42]: p.cat(0)
  0.0/0         | 0001 | import time
  0.0/0         | 0002 | import os
  0.0/0         | 0003 | import pdb
  0.0/0         | 0004 |
  0.1/16000     | 0005 | def _sleep(t):
 14.2/16000     | 0006 |     time.sleep(t / 1000000.0)
  0.2/16000     | 0007 |     return t
  0.0/0         | 0008 |
  0.1/16000     | 0009 | def aaa(a):
  0.1/16000     | 0010 |     return a
  0.0/0         | 0011 |
  0.1/16000     | 0012 | def bbb(b):
  0.4/16000     | 0013 |     return _sleep(b)
  0.0/0         | 0014 |
  0.1/16000     | 0015 | def ccc(c):
  0.5/16000     | 0016 |     if not os.path.exists('/proc/%d' % c): c ^= c
  0.2/16000     | 0017 |     return c
  0.0/0         | 0018 |
  0.0/0         | 0019 | print('test')
  0.0/0         | 0020 |
  0.0/1         | 0021 | def foo(a, b, c):
  0.0/1         | 0022 |     r = 0
  0.0/0         | 0023 |
  0.0/9         | 0024 |     for i in range(a):
  0.0/168       | 0025 |         for j in range(b):
  0.2/16160     | 0026 |             for k in range(c):
  0.1/16000     | 0027 |                 r += sum([
  0.6/16000     | 0028 |                     aaa(a), bbb(b),
  0.4/16000     | 0029 |                     ccc(c)
  0.0/0         | 0030 |                 ])
  0.0/0         | 0031 |
  0.0/1         | 0032 |     return r
  0.0/0         | 0033 |
  0.0/1         | 0034 | def bar(i=3):
  0.0/1         | 0035 |     return foo(i, 20, 100)

# Grep for a string in all files known to the profiler
In [43]: p.grep('def ccc')
# file[0]: /tmp/sample.py
  0.1/16000     | 0015 | def ccc(c):

# Get a list of the worst lines of code, exceeding 0.5s here, and optionally limit the search by providing a filter expression (not demonstrated here)
In [54]: jprint(p.get_filtered_profile(0.5))
[
  {
    "address": "/tmp/sample.py:6",
    "datum": {
      "duration": 17.271899461746216,
      "file_name": "/tmp/sample.py",
      "line": {
        "block_id": [
          "/tmp/sample.py",
          "_sleep",
          5
        ],
        "duration": 14.248581409454346,
        "hits": 16000,
        "line": "    time.sleep(t / 1000000.0)",
        "line_no": 6,
        "line_profile": []
      }
    }
  },
  {
    "address": "/tmp/sample.py:28",
    "datum": {
      "duration": 17.271899461746216,
      "file_name": "/tmp/sample.py",
      "line": {
        "block_id": [
          "/tmp/sample.py",
          "foo",
          21
        ],
        "duration": 0.6394095420837402,
        "hits": 16000,
        "line": "                    aaa(a), bbb(b),",
        "line_no": 28,
        "line_profile": [
          {
            "callee_file": "/tmp/sample.py",
            "callee_line": 12,
            "callee_name": "bbb",
            "duration": 15.001068115234375,
            "hits": 16000
          },
          {
            "callee_file": "/tmp/sample.py",
            "callee_line": 9,
            "callee_name": "aaa",
            "duration": 0.20499634742736816,
            "hits": 16000
          }
        ]
      }
    }
  }
]

Jul 28 '16 15:07 nima

I commented on minor, main-feature-independent issues in the code.

On the main feature, I am a bit worried about the impact on forward compatibility: I wonder how dicts of lists of dicts (etc) work as an API.

For example, line_profile['block_id'] has at least 3 formats:

module, None, None
module, None, ''
module, func, lineno

The 3rd item can take 3 types depending on some internal details. Caller code will look like a 3-branch "if", and if it does not it is likely to fail when profiling another source code. Is caller supposed to treat this as a black box ? How to document this ? How to prevent naïve introspection of this value ?

From a quick look at the pprofiler.py file in your branch, maybe this is what should get exposed instead: it would subclass profiling classes and access the needed internal details directly (_FileTiming instances). Maybe it could be named ipprofile.py to express the interactive usage intention (which softens the constraints on forward compatibility).

Jul 29 '16 01:07 vpelletier

On block_id - this wasn't a successful venture and I'm going to remove it. I was trying provide the user with an abstract view of the code, so that they can run queries like "what are the slowest blocks in the code?" rather than simply "what are the slowest lines". I will revert this change and push - but would be really interested to hear what your thoughts are - can we do this in a deterministic way? Rather than treating the source code as a pile of lines, can we dually profile it as a collection of objects, methods, and functions - and construct a tree? That would be super-cool!

Other than the block_id thing however, I did try and make the return object consistent - should be consistent enough that we could convert the output to JSON and police it against a JSON schema. In fact that would be a great unit test to add - I can a JSON schema (http://json-schema.org/documentation.html) for it if you agree?

So my goal here was to make get_status() the source of the profile data - then rewrite functions such as annotate() and callgring() to call this function for their input data.

Does that sounds like a bad idea?

I would be nice for the user not to have to subclass this, but simply be able to call get_stats() and do what they want to with that data - the same was as annotate() and callgrind() would. In fact, those could conceptually be split out the same way a user would be expected to use get_stats().

I'm open to changing the actual format of the object returned by get_stats() - moving towards a standard structure (either dict/list format, or an actual object (but that may slow things down) per entity. By entity I mean a line, module, function, etc.

What are your thoughts?

Jul 29 '16 16:07 nima

I was trying provide the user with an abstract view of the code

I would tend to do this using extra run-time introspection, like what I did in zpprofile.py when encountering specific functions (object database and relational database queries) - just generalised to all functions: from current callable's details, tell if it is bound to an instance, maybe of what class etc. Then it opens the possibility to not only track hot classes but also hot instances.

So my goal here was to make get_status() the source of the profile data - then rewrite functions such as annotate() and callgring() to call this function for their input data.

On the avoiding-code-rot level, it is a good idea.

But as I said, I think it would be better to expose the interactive-usage-friendly layer directly rather than an extra level of internal state: it is easy to defend breaking (by future evolutions - and I prefer not to presume of what evolutions will do to the internal data structures) the API in the former because caller is a human on a command line: he can quickly see something changed and adapt. But it is hard to defend breaking the API in the latter case because python structures are just too tempting to use in another program (integrating pprofile), and once caller is a program nothing auto-corrects anymore.

So I would rather:

not expose data in a new python structure (if - and that's still a big "if" to me - it should be done, maybe _FileTiming would be a better format to expose as it allows more room for backward-compatibility code than having to later fake being a dict/list and overloading __getitem__)
as a consequence, keep callgrind & annotate output generators as-is
add classes in a new module similar to zpprofile.py, which would contain interactive-friendly commands and are free to themselves access _FileTiming instances and all pprofile internals: if internal data structure changes, that code can follow easily as it is in the same repository (easy to find) and released at the same time

Jul 30 '16 00:07 vpelletier

pprofile pprofile copied to clipboard

New method to get profiler data as python object

pprofile
pprofile copied to clipboard