multi-core-python icon indicating copy to clipboard operation
multi-core-python copied to clipboard

Clarify what is a "sub-interpreter" and what is an "interpreter".

Open markshannon opened this issue 5 years ago • 12 comments

PEP 554 is entitled "Multiple Interpreters in the Stdlib", yet the term "subinterpreters" is used throughout this repo.

There is the additional confusion of the C struct names. It seems to me that the C struct PyInterpreterState corresponds to the sub-interpreter and that the C struct _PyRuntimeState corresponds to the interpreter.

Confusion about which is which makes the goals of this project unclear, and I fear may have resulted in some unnecessary work, as data structures are moved to PyInterpreterState that could more easily, and with less impact, been moved to (or left in) _PyRuntimeState.

markshannon avatar Oct 19 '20 19:10 markshannon

IMO, "subinterpreter" not a good term; generally we should aim to make all interpreters equal (though that can be a long-term goal).

encukou avatar Oct 20 '20 08:10 encukou

Sub-interpreters already exist, whether we like the term or not. They share the same heap, although they cannot see each other's sub-heap, just common objects like builtin types and numbers.

Why not leave them working as they do now, and enable multiple interpreters? That way seems easier to implement in practice, and causes less breakage (at least, no more breakage).

markshannon avatar Oct 20 '20 10:10 markshannon

As far as I can see, "sub-interpreter" and "interpeter" are basically interchangeable terms at this point. See e.g. the first two sentences in the Py_NewInterpreter docs. They shared some objctes like builtin types and numbers, which are immutable and currently OK to share – until you want per-interpreter GIL, which is one of the goals in this repo. And unfortunately they also sometimes share some objects which they shouldn't, like anything that references a Python function's globals, so I'd rather fix them, not leave them working as they do now.

_PyRuntimeState holds the stuff that's common to all (sub-)interpreters, such as, well, the list of (sub-)interpreters. Everything else should be per-(sub)interpreter.

encukou avatar Oct 20 '20 11:10 encukou

The problem with that approach is that involves a lot of moving stuff from _PyRuntimeState to PyInterpreterState. Wouldn't allowing several _PyRuntimeState be less work as it already has a GIL? It would also allow subinterpreters to work as they currently do.

Until multiple interpreters can run in parallel, moving global state into _PyRuntimeState has no adverse impact on performance. Moving that state into PyInterpreterState slows things down.

markshannon avatar Oct 20 '20 13:10 markshannon

Wouldn't allowing several _PyRuntimeState be less work as it already has a GIL?

I doubt it – you'd need to make a per-_PyRuntimeState GIL, whereas in the current approach you'd need to make a per-PyInterpreterState GIL. The main issues, like making sure threads don't mangle a shared object's refcounts, are basically the same.

What exactly do you mean by allowing subinterpreters to work as they currently do?

encukou avatar Oct 20 '20 13:10 encukou

All sub-interpreters share the same heap (even though they can see different parts of it) and share the GIL.

markshannon avatar Oct 20 '20 15:10 markshannon

So, to clarify, under your proposal with multiple _PyRuntimeState, we would plan to make one GIL per _PyRuntimeState? Would sub-interpreters from different _PyRuntimeStates not share the heap?

encukou avatar Oct 21 '20 07:10 encukou

Doesn't sharing a heap between interpreters require synchronization for the cycle GC?

markshannon avatar Oct 21 '20 11:10 markshannon

My main point is that without clearer naming, it is impossible to discuss these alternatives without a lot confusion.

markshannon avatar Oct 21 '20 11:10 markshannon

OK. Here's my take. You can have multiple interpreters in a single process. They should be isolated from each other; we're working on improving that isolation. The term subinterpreter essentially means the same thing as interpreter. There are subtle differences:

  • If you start one interpreter from another, you'd call the child a "subinterpreter". (But you can also start interpreters from pure C code, and subinterpreters should be able to outlive their parents, though I don't think the high-level API is built for that.)
  • Saying "subinterpreters" makes it clear that you're working on better support for multiple interpreters, as opposed to improving other aspects of Python. Not a very good label, IMO, but it's what's used.

As for an earlier question, I don't think that moving stuff from _PyRuntimeState to PyInterpreterState is more work than allowing several _PyRuntimeState. But then, I'm not the one actually doing that work.

encukou avatar Oct 21 '20 12:10 encukou

The key detail is that there is a "main" interpreter:

  • created during runtime initialization
  • used during runtime initialization
  • used during runtime finalization
  • the initial interpreter exposed to users
  • has the "main" thread

We have been calling all other interpreters in the runtime "subinterpreters".

FWIW, in the context of PEP 554, we start at the main interpreter. Each new interpreter then effectively ends up as a node in an implicit tree relative to "parent" interpreter under which the new one was created. However, that isn't fundamental at the C level.

ericsnowcurrently avatar Oct 21 '20 15:10 ericsnowcurrently

FYI, the C-API docs have a paragraph explaining the distinction (thanks to @nanjekyejoannah).

@markshannon, do you think it would help to have more detail there? (IMHO, there isn't much more to say that we say there.)

ericsnowcurrently avatar Oct 22 '20 14:10 ericsnowcurrently