blog icon indicating copy to clipboard operation
blog copied to clipboard

Python中的code对象

Open junnplus opened this issue 7 years ago • 3 comments

print('Hello, World')

def hello():
    print('world')
>>> co = compile(open('hello.py').read(), 'hello.py', 'exec')
>>> co
<code object <module> at 0x10b4d4270, file "hello.py", line 2>
>>> type(co)
<class 'code'>

Python代码在Python解释器中是以code对象存在的。

<class 'code'>就是PyCodeObject对象,定义在Include/code.h

/* Bytecode object */
typedef struct {
    PyObject_HEAD
    int co_argcount;		/* #arguments, except *args */
    int co_kwonlyargcount;	/* #keyword only arguments */
    int co_nlocals;		/* #local variables */
    int co_stacksize;		/* #entries needed for evaluation stack */
    int co_flags;		/* CO_..., see below */
    int co_firstlineno;   /* first source line number */
    PyObject *co_code;		/* instruction opcodes */
    PyObject *co_consts;	/* list (constants used) */
    PyObject *co_names;		/* list of strings (names used) */
    PyObject *co_varnames;	/* tuple of strings (local variable names) */
    PyObject *co_freevars;	/* tuple of strings (free variable names) */
    PyObject *co_cellvars;      /* tuple of strings (cell variable names) */
    /* The rest aren't used in either hash or comparisons, except for co_name,
       used in both. This is done to preserve the name and line number
       for tracebacks and debuggers; otherwise, constant de-duplication
       would collapse identical functions/lambdas defined on different lines.
    */
    Py_ssize_t *co_cell2arg;    /* Maps cell vars which are arguments. */
    PyObject *co_filename;	/* unicode (where it was loaded from) */
    PyObject *co_name;		/* unicode (name, for reference) */
    PyObject *co_lnotab;	/* string (encoding addr<->lineno mapping) See
				   Objects/lnotab_notes.txt for details. */
    void *co_zombieframe;     /* for optimization only (see frameobject.c) */
    PyObject *co_weakreflist;   /* to support weakrefs to code objects */
    /* Scratch space for extra data relating to the code object.
       Type is a void* to keep the format private in codeobject.c to force
       people to go through the proper APIs. */
    void *co_extra;
} PyCodeObject;

PyCodeObject对象中部分域的解释:

Field Content Type
co_argcount Code Block 的参数个数 PyIntObject
co_kwonlyargcount Code Block 的关键字参数个数 PyIntObject
co_nlocals Code Block 中局部变量的个数 PyIntObject
co_stacksize Code Block 的栈大小 PyIntObject
co_flags N/A PyIntObject
co_firstlineno Code Block 对应的 .py 文件中的起始行号 PyIntObject
co_code Code Block 编译所得的字节码 PyBytesObject
co_consts Code Block 中的常量集合 PyTupleObject
co_names Code Block 中的符号集合 PyTupleObject
co_varnames Code Block 中的局部变量名集合 PyTupleObject
co_freevars Code Block 中的自由变量名集合 PyTupleObject
co_cellvars Code Block 中嵌套函数所引用的局部变量名集合 PyTupleObject
co_cell2arg N/A PyTupleObject
co_filename Code Block 对应的 .py 文件名 PyUnicodeObject
co_name Code Block 的名字,通常是函数名/类名/模块名 PyUnicodeObject
co_lnotab Code Block 的字节码指令于 .py 文件中 source code 行号对应关系 PyBytesObject

关于Code Block的解释:

Python编译器在对Python源代码进行编译的时候,对于代码中的一个Code Block,会创建一个PyCodeObject对象与这段代码对应。那么如何确定多少代码算是一个Code Block呢?事实上,Python有一个简单而清晰的规则:当进入一个新的名字空间,或者说作用域时,我们就算是进入了一个新的Code Block了。 摘自:《Python源码剖析》 — 陈儒

所以,文章开头的code_str对应了2个PyCodeObject对象,一个是全局的Code Block,一个是def hello对应的Code Block。其中co变量就代表全局PyCodeObject对象,另外一个PyCodeObject对象存在co.co_consts这个域中:

>>> co.co_consts
('Hello, World', <code object hello at 0x10b4d4030, file "hello.py", line 4>, 'hello', None)

pyc 文件

在程序运行期间,编译结果存在于内存的PyCodeObject对象中;而Python结束运行后,编译结果又被保存到了pyc文件中。当下一次运行相同的程序时,Python会根据pyc文件中记录的编译结果直接建立内存中的PyCodeObject对象,而不用再次对源文件进行编译了。

在pyc文件中,正襟危坐的其实是一个PyCodeObject对象,对于Python编译器来说,PyCodeObject对象才是其真正的编译结果,而pyc文件只是这个对象在硬盘上的表现形式,它们实际上是Python对源文件编译的结果的两种不同存在方式。 摘自:《Python源码剖析》 — 陈儒

Python3.3之后,pyc文件存在于__pycache__文件夹中,PEP 3147 -- PYC Repository Directories。 另外,pyc文件只在import机制下产生。

可以通过读取pyc文件获取PyCodeObject对象:

>>> import marshal
>>> with open('__pycache__/hello.cpython-36.pyc', 'rb') as f:
...     f.seek(12)
...     co = marshal.load(f)
...     print(co)
...
<code object <module> at 0x101e0e300, file "hello.py", line 1>

字节码

>>> import dis
>>> dis.dis(co)
  2           0 LOAD_NAME                0 (print)
              2 LOAD_CONST               0 ('Hello, World')
              4 CALL_FUNCTION            1
              6 POP_TOP

  4           8 LOAD_CONST               1 (<code object hello at 0x109895030, file "hello.py", line 4>)
             10 LOAD_CONST               2 ('hello')
             12 MAKE_FUNCTION            0
             14 STORE_NAME               1 (hello)
             16 LOAD_CONST               3 (None)
             18 RETURN_VALUE

延伸阅读

junnplus avatar Jan 24 '18 08:01 junnplus

捕获一只上班摸鱼的大佬

yetingsky avatar Jan 24 '18 09:01 yetingsky

@yetingsky |(- _-)|

junnplus avatar Jan 24 '18 10:01 junnplus

python 的dis 模块其实就是反汇编,将python compile出来的字节码解析成人可读的形式option codes,interpreter就是针对这些option codes不停的执行解析。

image

实际上你也可以用Python或者Golang实现一个python的interpreter来解析这些字节码(option codes),可以参考《500lines code》里面对应的章节。

csrgxtu avatar Jul 10 '21 06:07 csrgxtu