blog icon indicating copy to clipboard operation
blog copied to clipboard

Python中的循环语句

Open junnplus opened this issue 7 years ago • 1 comments

0x00 for循环

一个简单的for循环语句:

for i in (1, 2):
    pass

字节码如下:

  1           0 SETUP_LOOP              12 (to 14)
              2 LOAD_CONST               3 ((1, 2))
              4 GET_ITER
        >>    6 FOR_ITER                 4 (to 12)
              8 STORE_NAME               0 (i)

  2          10 JUMP_ABSOLUTE            6
        >>   12 POP_BLOCK
        >>   14 LOAD_CONST               2 (None)
             16 RETURN_VALUE

for循环从SETUP_LOOP指令开始:

TARGET(SETUP_LOOP)
TARGET(SETUP_EXCEPT)
TARGET(SETUP_FINALLY) {
    /* NOTE: If you add any new block-setup opcodes that
       are not try/except/finally handlers, you may need
       to update the PyGen_NeedsFinalizing() function.
       */

    PyFrame_BlockSetup(f, opcode, INSTR_OFFSET() + oparg,
                       STACK_LEVEL());
    DISPATCH();
}

// Objects/frameobject.c
void
PyFrame_BlockSetup(PyFrameObject *f, int type, int handler, int level)
{
    PyTryBlock *b;
    if (f->f_iblock >= CO_MAXBLOCKS)
        Py_FatalError("XXX block stack overflow");
    b = &f->f_blockstack[f->f_iblock++];
    b->b_type = type;
    b->b_level = level;
    b->b_handler = handler;
}

#define INSTR_OFFSET()  \
    (sizeof(_Py_CODEUNIT) * (int)(next_instr - first_instr))
#define STACK_LEVEL()     ((int)(stack_pointer - f->f_valuestack))

SETUP_LOOP指令调用PyFrame_BlockSetup函数向f->f_blockstack申请了一个PyTryBlock

PyTryBlock用来保存一些信息,关于PyTryBlock的定义:

typedef struct {
    int b_type;                 /* what kind of block this is */
    int b_handler;              /* where to jump to find handler */
    int b_level;                /* value stack level to pop to */
} PyTryBlock;

可见PyFrame_BlockSetup函数初始化了PyTryBlock结构。

  • b_type保存当前的字节码指令;
  • b_level保存当前“运行时栈”的深度;
  • b_handler保存for循环结束后的下一条指令位置。

PyTryBlock除了for循环的SETUP_LOOP指令之外,还在SETUP_EXCEPT指令和SETUP_FINALLY用到。

将for循环对象压栈之后就是GET_ITER指令:

TARGET(GET_ITER) {
    /* before: [obj]; after [getiter(obj)] */
    PyObject *iterable = TOP();
    PyObject *iter = PyObject_GetIter(iterable);
    Py_DECREF(iterable);
    SET_TOP(iter);
    if (iter == NULL)
        goto error;
    PREDICT(FOR_ITER);
    PREDICT(CALL_FUNCTION);
    DISPATCH();
}

GET_ITER指令通过PyObject_GetIter函数来获取栈顶对象的迭代器。PyObject_GetIter函数定义在Objects/abstract.c

PyObject *
PyObject_GetIter(PyObject *o)
{
    PyTypeObject *t = o->ob_type;
    getiterfunc f;

    f = t->tp_iter;
    if (f == NULL) {
        if (PySequence_Check(o))
            return PySeqIter_New(o);
        return type_error("'%.200s' object is not iterable", o);
    }
    else {
        PyObject *res = (*f)(o);
        if (res != NULL && !PyIter_Check(res)) {
            PyErr_Format(PyExc_TypeError,
                         "iter() returned non-iterator "
                         "of type '%.100s'",
                         res->ob_type->tp_name);
            Py_DECREF(res);
            res = NULL;
        }
        return res;
    }
}

PyObject_GetIter函数获取迭代器实际上是委托给对象类型的tp_iter方法或通过PySeqIter_New函数获取序列对象的迭代器。

迭代器对象

list对象类型PyList_Type对应的tp_iter方法是list_iter,定义在Objects/listobject.c

PyTypeObject PyList_Type = {
    PyVarObject_HEAD_INIT(&PyType_Type, 0)
    "list",
    sizeof(PyListObject),
    ...
    list_iter,                                  /* tp_iter */
    ...
}

static PyObject *
list_iter(PyObject *seq)
{
    listiterobject *it;

    if (!PyList_Check(seq)) {
        PyErr_BadInternalCall();
        return NULL;
    }
    it = PyObject_GC_New(listiterobject, &PyListIter_Type);
    if (it == NULL)
        return NULL;
    it->it_index = 0;
    Py_INCREF(seq);
    it->it_seq = (PyListObject *)seq;
    _PyObject_GC_TRACK(it);
    return (PyObject *)it;
}

可见list对象对应的迭代器是一个类型为PyListIter_Typelistiterobject对象。

typedef struct {
    PyObject_HEAD
    Py_ssize_t it_index;
    PyListObject *it_seq; /* Set to NULL when iterator is exhausted */
} listiterobject;

PyTypeObject PyListIter_Type = {
    PyVarObject_HEAD_INIT(&PyType_Type, 0)
    "list_iterator",                            /* tp_name */
    ...
    PyObject_SelfIter,                          /* tp_iter */
    (iternextfunc)listiter_next,                /* tp_iternext */
    ...
}

listiterobject对象其实就是对PyListObject对象做了一个简单的包装,在迭代器中,维护了当前访问的元素在PyListObject对象中的序号it_index。通过这个序号,listiterobject对象就可以实现对PyListObject的遍历。

GET_ITER指令在获取迭代器之后,将迭代器对象置于栈顶。然后预判下一个指令是否是FOR_ITERCALL_FUNCTION指令。

预判FOR_ITER指令成功就进入for循环的迭代过程。

PREDICTED(FOR_ITER);
TARGET(FOR_ITER) {
    /* before: [iter]; after: [iter, iter()] *or* [] */
    PyObject *iter = TOP();
    PyObject *next = (*iter->ob_type->tp_iternext)(iter);
    if (next != NULL) {
        PUSH(next);
        PREDICT(STORE_FAST);
        PREDICT(UNPACK_SEQUENCE);
        DISPATCH();
    }
    ...
}

FOR_ITER指令获取栈顶的迭代器对象后,调用迭代器对象类型的tp_iternext方法来进行迭代。

listiterobject迭代器对象的类型PyListIter_Typetp_iter方法实现定义在Objects/listobject.c

static PyObject *
listiter_next(listiterobject *it)
{
    PyListObject *seq;
    PyObject *item;

    assert(it != NULL);
    seq = it->it_seq;
    if (seq == NULL)
        return NULL;
    assert(PyList_Check(seq));

    if (it->it_index < PyList_GET_SIZE(seq)) {
        item = PyList_GET_ITEM(seq, it->it_index);
        ++it->it_index;
        Py_INCREF(item);
        return item;
    }

    it->it_seq = NULL;
    Py_DECREF(seq);
    return NULL;
}

listiter_next函数总是返回与迭代器关联的容器对象中的下一个元素,如果当前已经抵达了容器对象的结束位置,那么tp_iternext将返回NULL,这个结果预示着遍历结束。

FOR_ITER指令检查迭代结果,如果得到有效对象就压入“运行时栈”,并预判指令。没有命中就继续执行下一条STORE_NAME指令将变量名i和迭代的值映射到local名字空间。

因为没有for循环的逻辑处理,直接由JUMP_ABSOLUTE指令跳到下一次迭代。

PREDICTED(JUMP_ABSOLUTE);
TARGET(JUMP_ABSOLUTE) {
    JUMPTO(oparg);
#if FAST_LOOPS
    /* Enabling this path speeds-up all while and for-loops by bypassing
       the per-loop checks for signals.  By default, this should be turned-off
       because it prevents detection of a control-break in tight loops like
       "while 1: pass".  Compile with this option turned-on when you need
       the speed-up and do not need break checking inside tight loops (ones
       that contain only instructions ending with FAST_DISPATCH).
    */
    FAST_DISPATCH();
#else
    DISPATCH();
#endif
}

JUMP_ABSOLUTE指令通过JUMPTO宏跳跃到FOR_ITER指令上。

TARGET(FOR_ITER) {
    ...
    if (PyErr_Occurred()) {
        if (!PyErr_ExceptionMatches(PyExc_StopIteration))
            goto error;
        else if (tstate->c_tracefunc != NULL)
            call_exc_trace(tstate->c_tracefunc, tstate->c_traceobj, tstate, f);
        PyErr_Clear();
    }
    /* iterator ended normally */
    STACKADJ(-1);
    Py_DECREF(iter);
    JUMPBY(oparg);
    PREDICT(POP_BLOCK);
    DISPATCH();
}

迭代结束之后,弹出栈顶的迭代器对象,通过JUMPBY宏跳跃到for循环结束之前的POP_BLOCK指令:

PREDICTED(POP_BLOCK);
TARGET(POP_BLOCK) {
    PyTryBlock *b = PyFrame_BlockPop(f);
    UNWIND_BLOCK(b);
    DISPATCH();
}

// Objects/frameobject.c
PyTryBlock *
PyFrame_BlockPop(PyFrameObject *f)
{
    PyTryBlock *b;
    if (f->f_iblock <= 0)
        Py_FatalError("XXX block stack underflow");
    b = &f->f_blockstack[--f->f_iblock];
    return b;
}

#define UNWIND_BLOCK(b) \
    while (STACK_LEVEL() > (b)->b_level) { \
        PyObject *v = POP(); \
        Py_XDECREF(v); \
    }

POP_BLOCK指令回退f_blockstack的栈顶指针f_iblock来弹出PyTryBlock结构,并通过UNWIND_BLOCK宏将“运行时栈”恢复到SETUP_LOOP指令之前的状态,从而完成了整个for循环。

0x01 while循环

while 1 < 2:
    pass

while循环的指令比起for循环就简单多了:

  1           0 SETUP_LOOP              12 (to 14)
        >>    2 LOAD_CONST               0 (1)
              4 LOAD_CONST               1 (2)
              6 COMPARE_OP               0 (<)
              8 POP_JUMP_IF_FALSE       12

  2          10 JUMP_ABSOLUTE            2
        >>   12 POP_BLOCK
        >>   14 LOAD_CONST               2 (None)
             16 RETURN_VALUE

POP_JUMP_IF_FALSE之前的指令来判断表达式是否成立再决定跳跃到下一次执行的字节码指令的位置。

表达式成立的话就继续下一条指令的执行,如果有逻辑处理就会在JUMP_ABSOLUTE指令之前被处理。JUMP_ABSOLUTE指令会跳回while循环的起点指令。

表达式不成立就会跳跃到while循环结束之前的POP_BLOCK指令来恢复while循环开始的状态。

junnplus avatar Apr 06 '18 16:04 junnplus

头脑一热地进来,头脑清醒地出去

Simpidbit avatar May 11 '20 17:05 Simpidbit