blog
blog copied to clipboard
Python中的code对象
print('Hello, World')
def hello():
print('world')
>>> co = compile(open('hello.py').read(), 'hello.py', 'exec')
>>> co
<code object <module> at 0x10b4d4270, file "hello.py", line 2>
>>> type(co)
<class 'code'>
Python代码在Python解释器中是以code对象存在的。
<class 'code'>就是PyCodeObject对象,定义在Include/code.h:
/* Bytecode object */
typedef struct {
PyObject_HEAD
int co_argcount; /* #arguments, except *args */
int co_kwonlyargcount; /* #keyword only arguments */
int co_nlocals; /* #local variables */
int co_stacksize; /* #entries needed for evaluation stack */
int co_flags; /* CO_..., see below */
int co_firstlineno; /* first source line number */
PyObject *co_code; /* instruction opcodes */
PyObject *co_consts; /* list (constants used) */
PyObject *co_names; /* list of strings (names used) */
PyObject *co_varnames; /* tuple of strings (local variable names) */
PyObject *co_freevars; /* tuple of strings (free variable names) */
PyObject *co_cellvars; /* tuple of strings (cell variable names) */
/* The rest aren't used in either hash or comparisons, except for co_name,
used in both. This is done to preserve the name and line number
for tracebacks and debuggers; otherwise, constant de-duplication
would collapse identical functions/lambdas defined on different lines.
*/
Py_ssize_t *co_cell2arg; /* Maps cell vars which are arguments. */
PyObject *co_filename; /* unicode (where it was loaded from) */
PyObject *co_name; /* unicode (name, for reference) */
PyObject *co_lnotab; /* string (encoding addr<->lineno mapping) See
Objects/lnotab_notes.txt for details. */
void *co_zombieframe; /* for optimization only (see frameobject.c) */
PyObject *co_weakreflist; /* to support weakrefs to code objects */
/* Scratch space for extra data relating to the code object.
Type is a void* to keep the format private in codeobject.c to force
people to go through the proper APIs. */
void *co_extra;
} PyCodeObject;
PyCodeObject对象中部分域的解释:
| Field | Content | Type |
|---|---|---|
| co_argcount | Code Block 的参数个数 | PyIntObject |
| co_kwonlyargcount | Code Block 的关键字参数个数 | PyIntObject |
| co_nlocals | Code Block 中局部变量的个数 | PyIntObject |
| co_stacksize | Code Block 的栈大小 | PyIntObject |
| co_flags | N/A | PyIntObject |
| co_firstlineno | Code Block 对应的 .py 文件中的起始行号 | PyIntObject |
| co_code | Code Block 编译所得的字节码 | PyBytesObject |
| co_consts | Code Block 中的常量集合 | PyTupleObject |
| co_names | Code Block 中的符号集合 | PyTupleObject |
| co_varnames | Code Block 中的局部变量名集合 | PyTupleObject |
| co_freevars | Code Block 中的自由变量名集合 | PyTupleObject |
| co_cellvars | Code Block 中嵌套函数所引用的局部变量名集合 | PyTupleObject |
| co_cell2arg | N/A | PyTupleObject |
| co_filename | Code Block 对应的 .py 文件名 | PyUnicodeObject |
| co_name | Code Block 的名字,通常是函数名/类名/模块名 | PyUnicodeObject |
| co_lnotab | Code Block 的字节码指令于 .py 文件中 source code 行号对应关系 | PyBytesObject |
关于Code Block的解释:
Python编译器在对Python源代码进行编译的时候,对于代码中的一个Code Block,会创建一个PyCodeObject对象与这段代码对应。那么如何确定多少代码算是一个Code Block呢?事实上,Python有一个简单而清晰的规则:当进入一个新的名字空间,或者说作用域时,我们就算是进入了一个新的Code Block了。 摘自:《Python源码剖析》 — 陈儒
所以,文章开头的code_str对应了2个PyCodeObject对象,一个是全局的Code Block,一个是def hello对应的Code Block。其中co变量就代表全局PyCodeObject对象,另外一个PyCodeObject对象存在co.co_consts这个域中:
>>> co.co_consts
('Hello, World', <code object hello at 0x10b4d4030, file "hello.py", line 4>, 'hello', None)
pyc 文件
在程序运行期间,编译结果存在于内存的PyCodeObject对象中;而Python结束运行后,编译结果又被保存到了pyc文件中。当下一次运行相同的程序时,Python会根据pyc文件中记录的编译结果直接建立内存中的PyCodeObject对象,而不用再次对源文件进行编译了。
在pyc文件中,正襟危坐的其实是一个PyCodeObject对象,对于Python编译器来说,PyCodeObject对象才是其真正的编译结果,而pyc文件只是这个对象在硬盘上的表现形式,它们实际上是Python对源文件编译的结果的两种不同存在方式。 摘自:《Python源码剖析》 — 陈儒
Python3.3之后,pyc文件存在于__pycache__文件夹中,PEP 3147 -- PYC Repository Directories。
另外,pyc文件只在import机制下产生。
可以通过读取pyc文件获取PyCodeObject对象:
>>> import marshal
>>> with open('__pycache__/hello.cpython-36.pyc', 'rb') as f:
... f.seek(12)
... co = marshal.load(f)
... print(co)
...
<code object <module> at 0x101e0e300, file "hello.py", line 1>
字节码
>>> import dis
>>> dis.dis(co)
2 0 LOAD_NAME 0 (print)
2 LOAD_CONST 0 ('Hello, World')
4 CALL_FUNCTION 1
6 POP_TOP
4 8 LOAD_CONST 1 (<code object hello at 0x109895030, file "hello.py", line 4>)
10 LOAD_CONST 2 ('hello')
12 MAKE_FUNCTION 0
14 STORE_NAME 1 (hello)
16 LOAD_CONST 3 (None)
18 RETURN_VALUE
延伸阅读
捕获一只上班摸鱼的大佬
@yetingsky |(- _-)|
python 的dis 模块其实就是反汇编,将python compile出来的字节码解析成人可读的形式option codes,interpreter就是针对这些option codes不停的执行解析。

实际上你也可以用Python或者Golang实现一个python的interpreter来解析这些字节码(option codes),可以参考《500lines code》里面对应的章节。