blog
blog copied to clipboard
Python的复杂赋值语句
>>> co = compile('a, b, c = 1, 2, 3', '', 'single')
>>> co.co_names
('a', 'b', 'c')
>>> co.co_consts
(1, 2, 3, None, (1, 2, 3))
可以看得出来,等式右边的1, 2, 3是作为一个元组被编译的。
>>> dis.dis(co)
1 0 LOAD_CONST 4 ((1, 2, 3))
2 UNPACK_SEQUENCE 3
4 STORE_NAME 0 (a)
6 STORE_NAME 1 (b)
8 STORE_NAME 2 (c)
10 LOAD_CONST 3 (None)
12 RETURN_VALUE
LOAD_CONST指令将(1, 2, 3)元组压入“运行时栈”中。
UNPACK_SEQUENCE的oparg参数是3,表示解包到3个变量上。
TARGET(UNPACK_SEQUENCE) {
PyObject *seq = POP(), *item, **items;
if (PyTuple_CheckExact(seq) &&
PyTuple_GET_SIZE(seq) == oparg) {
items = ((PyTupleObject *)seq)->ob_item;
while (oparg--) {
item = items[oparg];
Py_INCREF(item);
PUSH(item);
}
} else if (PyList_CheckExact(seq) &&
PyList_GET_SIZE(seq) == oparg) {
items = ((PyListObject *)seq)->ob_item;
while (oparg--) {
item = items[oparg];
Py_INCREF(item);
PUSH(item);
}
} else if (unpack_iterable(seq, oparg, -1,
stack_pointer + oparg)) {
STACKADJ(oparg);
} else {
/* unpack_iterable() raised an exception */
Py_DECREF(seq);
goto error;
}
Py_DECREF(seq);
DISPATCH();
}
UNPACK_SEQUENCE指令先POP栈顶对象,对满足tuple或list类型且oparg参数等于对象长度(也就是赋值语句的等式两边个数一致)的栈顶对象,将里面的元素从右往左依次入栈,最后栈顶是1。
对于其他的可迭代对象或等式两边个数不一致语句,就通过unpack_iterable函数来解包压栈。unpack_iterable函数的具体实现会在后面提到。
接下去就是3个STORE_NAME指令就是将3个变量名和值的映射添加到local名字空间中。
我们来看下赋值语句中,等式两边个数不一致的情况:
>>> co = compile('a, b = 1, 2, 3', '', 'single')
>>> co.co_names
('a', 'b')
>>> co.co_consts
(1, 2, 3, None, (1, 2, 3))
这种情况下的字节码指令差别在于UNPACK_SEQUENCE的oparg参数为2,对两个变量的赋值:
>>> dis.dis(co)
1 0 LOAD_CONST 4 ((1, 2, 3))
2 UNPACK_SEQUENCE 2
4 STORE_NAME 0 (a)
6 STORE_NAME 1 (b)
8 LOAD_CONST 3 (None)
10 RETURN_VALUE
因为等式两边个数不一致,这边就需要看unpack_iterable函数的实现:
/* Iterate v argcnt times and store the results on the stack (via decreasing
sp). Return 1 for success, 0 if error.
If argcntafter == -1, do a simple unpack. If it is >= 0, do an unpack
with a variable target.
*/
static int
unpack_iterable(PyObject *v, int argcnt, int argcntafter, PyObject **sp)
{
int i = 0, j = 0;
Py_ssize_t ll = 0;
PyObject *it; /* iter(v) */
PyObject *w;
PyObject *l = NULL; /* variable list */
assert(v != NULL);
it = PyObject_GetIter(v);
if (it == NULL) {
...
return 0;
}
// [1]
for (; i < argcnt; i++) {
w = PyIter_Next(it);
if (w == NULL) {
...
goto Error;
}
*--sp = w;
}
// [2]
if (argcntafter == -1) {
/* We better have exhausted the iterator now. */
w = PyIter_Next(it);
if (w == NULL) {
if (PyErr_Occurred())
goto Error;
Py_DECREF(it);
return 1;
}
Py_DECREF(w);
PyErr_Format(PyExc_ValueError,
"too many values to unpack (expected %d)",
argcnt);
goto Error;
}
...
从UNPACK_SEQUENCE指令我们可以知道,调用unpack_iterable函数的参数值:
v对应出栈的栈顶对象argcnt等于oparg参数argcntafter为-1sp等于stack_pointer + oparg,也就是栈顶指针向前移,空出了oparg大小的栈空间
[1] unpack_iterable将会迭代对象argcnt次,然后以sp指针递减的方式入栈。
[2] argcntafter等于-1的情况下,发现可以再次迭代,也就是等式两边个数不一致,右边的个数大于左边会导致抛出ValueError
ValueError: too many values to unpack (expected 2)
所以a, b = 1, 2, 3这样的赋值语句是不正确的。Python3中支持新的解包方式:
>>> co1 = compile('a, *b = 1, 2, 3', '', 'single')
>>> co2 = compile('*a, b = 1, 2, 3', '', 'single')
带*号的变量会收集额外的值。字节码指令也有所不同:
>>> dis.dis(co1)
1 0 LOAD_CONST 4 ((1, 2, 3))
2 UNPACK_EX 1
4 STORE_NAME 0 (a)
6 STORE_NAME 1 (b)
8 LOAD_CONST 3 (None)
10 RETURN_VALUE
>>> dis.dis(co2)
1 0 LOAD_CONST 4 ((1, 2, 3))
2 EXTENDED_ARG 1
4 UNPACK_EX 256
6 STORE_NAME 0 (a)
8 STORE_NAME 1 (b)
10 LOAD_CONST 3 (None)
12 RETURN_VALUE
UNPACK_EX指令代替了原来的UNPACK_SEQUENCE指令,第二个语句还多了一个EXTENDED_ARG指令:
EXTENDED_ARG(ext) Prefixes any opcode which has an argument too big to fit into the default two bytes. ext holds two additional bytes which, taken together with the subsequent opcode’s argument, comprise a four-byte argument, ext being the two most-significant bytes.
UNPACK_EX(counts) Implements assignment with a starred target: Unpacks an iterable in TOS into individual values, where the total number of values can be smaller than the number of items in the iterable: one of the new values will be a list of all leftover items.
The low byte of counts is the number of values before the list value, the high byte of counts the number of values after it. The resulting values are put onto the stack right-to-left.
说起EXTENDED_ARG指令还得提一下Python3.6之前的每条字节码可能占一字节或三字节,字节码指令占一个字节,参数占两字节(没有参数不占用)。为了解决参数只有两字节最多只能表示65536个变量的问题而引入的EXTENDED_ARG指令,EXTENDED_ARG指令的参数可以作为下一条指令参数的高16位表示,这样就可以用4个字节来表示参数。
不过Python3.6之后,每条字节码不管有没有参数都占两字节,其中字节码指令和参数各占一字节,所以现在EXTENDED_ARG指令只能多出一个字节来扩展参数。另外,这边其实可以忽略掉这个指令,因为dis模块已经将EXTENDED_ARG指令的参数作为下一个指令参数的高8位显示在下一个指令参数上了,可以看到co2的UNPACK_EX指令的参数已经大于一个字节所表示的范围。
TARGET(EXTENDED_ARG) {
int oldoparg = oparg;
NEXTOPARG();
oparg |= oldoparg << 8;
goto dispatch_opcode;
}
TARGET(UNPACK_EX) {
int totalargs = 1 + (oparg & 0xFF) + (oparg >> 8);
PyObject *seq = POP();
if (unpack_iterable(seq, oparg & 0xFF, oparg >> 8,
stack_pointer + totalargs)) {
stack_pointer += totalargs;
} else {
Py_DECREF(seq);
goto error;
}
Py_DECREF(seq);
DISPATCH();
}
UNPACK_EX指令利用位运算oparg & 0xFF和oparg >> 8计算*号变量左右两边变量个数,然后调用unpack_iterable函数处理元素入栈。
l = PySequence_List(it);
if (l == NULL)
goto Error;
*--sp = l;
i++;
ll = PyList_GET_SIZE(l);
if (ll < argcntafter) {
PyErr_Format(PyExc_ValueError,
"not enough values to unpack (expected at least %d, got %zd)",
argcnt + argcntafter, argcnt + ll);
goto Error;
}
/* Pop the "after-variable" args off the list. */
for (j = argcntafter; j > 0; j--, i++) {
*--sp = PyList_GET_ITEM(l, ll - j);
}
/* Resize the list. */
Py_SIZE(l) = ll - argcntafter;
Py_DECREF(it);
return 1;
Error:
for (; i > 0; i--, sp++)
Py_DECREF(*sp);
Py_XDECREF(it);
return 0;
}
前面我们已经知道,unpack_iterable函数以sp指针(非真实的栈顶指针)递减的方式入栈,先将argcnt(oparg & 0xFF)个数的元素入栈。
然后剩下的元素包装成list对象入栈,再从list对象偏移argcntafter(oparg >> 8)开始将元素依次入栈。
最后调整list对象的大小。
unpack_iterable函数正常返回1,回到UNPACK_EX指令之后会调整栈顶指针。