pyladies-maintainers icon indicating copy to clipboard operation
pyladies-maintainers copied to clipboard

Guided tour to the CPython source code

Open gvanrossum opened this issue 8 years ago • 53 comments

I propose that we collaborate on creating a guided tour to the CPython source code. I think such a thing is vastly overdue, and I am not aware of existing resources (though I'll gladly take pointers that prove me wrong).

I've more or less promised @emilymorehouse to help her get started with something like this, and I'm happy to put time in. @willingc I hope you think this would be a useful document to maintain in this repo? I suppose it would go under the cpython tree.

The proposed document differs from the existing guides like the devguide -- it doesn't tell you anything about the patch submission flow, nor even how to get things to build -- it should focus on how to read the CPython source code (using your editor of choice).

Some topics could include:

  • structure of the source code (what goes into Include, Objects, Python etc.)
  • bytecode (could start with @akaptur's videos)
  • reference counting
  • how to find the definition of something
  • how to find where a given error comes from

gvanrossum avatar Jul 27 '16 18:07 gvanrossum

Some of this is lightly touched in the devguide (e.g. http://cpython-devguide.readthedocs.io/en/latest/setup.html#directory-structure), but in no way thoroughly.

brettcannon avatar Jul 27 '16 18:07 brettcannon

Ah, thanks. I would like to go a step deeper for the directories containing most of the fundamental C code:

  • Include
  • Objects
  • Python
  • Modules
  • Parser

Other topics should probably include

  • how the parser works
  • how to define an object type (or is this in the C/API docs already?

gvanrossum avatar Jul 27 '16 19:07 gvanrossum

@gvanrossum are you soliciting assistance on this from others that aren't core devs too?

lorenanicole avatar Jul 27 '16 19:07 lorenanicole

For documentation about defining a type, it depends. There's the C API tutorial and then there's xxsubtype.c.

@lorenanicole Guido can say if I'm wrong, but there's absolutely no reason to restrict this to core devs. Honestly, it would be best to have someone who isn't a core dev involved so no bad assumptions of pre-existing knowledge muddles the tour.

brettcannon avatar Jul 27 '16 19:07 brettcannon

@lorenanicole: What Brett said. The core devs already know this and probably have huge blind spots. My own blind spots are even bigger. (I wrote some of the stuff Brett pointed to and I had forgotten about it. :-)

I also believe that a lot of the existing docs in this area assume lots of other skills that aren't so easy to come by. While I don't think we should include a C tutorial, there are a lot of C patterns that aren't unique to Python but are still worth explaining in some detail, either because Python's version has certain important details, or just because they aren't that widely known.

Some more random topics:

gvanrossum avatar Jul 27 '16 20:07 gvanrossum

  • How to handle errors (e.g. when to use a goto, making sure you deref appropriately)

brettcannon avatar Jul 27 '16 21:07 brettcannon

@gvanrossum Love this idea! It's similar to what I want to do with notebooks and Phillip Guo's grad school course. It also captures well the spirit of our PyCon conversation.

All,

The next paragraph that I'm writing is done with gentle kindness, respect, and thoughtfulness:

I do want the contributors to the documentation developed here to primarily or at least 50% written by those that are not core developers. A few reasons:

    1. to reinforce truth that technical talent already exists in the PyLadies community;
    1. to best communicate technical information in a manner that resonates with the PyLadies interested in being mentored (of course, all information will be openly available on GitHub so no one is being excluded.
    1. to provide an opportunity for PyLadies to own some of the maintainer responsibility for the document (i.e. merge privileges)
    1. to simplify the contribution process, review, and merging of changes.

Just open a PR with the folder name and outline that we would like to iterate on over time.

Also, please let me know if you would like to be added as a maintainer on the repo. I'll be on vacation with limited internet access beginning Friday to the 2nd week of August.

willingc avatar Jul 27 '16 22:07 willingc

IMHO, I think that it would be great to frame out by core devs and then have non-core devs tear it apart, ask questions, restructure and edit. However, I do not do work like this every day. But @Bradamant3 does! :-)

@Bradamant3, do you have thoughts on ways to approach this?

Background on @Bradamant3 Recent talks she has given...

jackiekazil avatar Jul 28 '16 01:07 jackiekazil

@jackiekazil and all -- I am honored to be mentioned here, and I'd be thrilled to be involved. I'm anything but a core dev -- I'm still pretty new to Python, still working my way through Learn Python the Hard Way, although I've taken a couple of handfuls of programming classes (no C, sadly) and write sample code for part of my living. (I'm a technical writer by trade, on the more technical side of things.) I'll need to spend some time over the next week looking through what's already in the repo and the other resources y'all have mentioned to see where I think I can best help out. I have much to learn, and if I can help others learn by documenting my own learning as I help out with this project, that would be the biggest thrill of all.

Bradamant3 avatar Jul 28 '16 02:07 Bradamant3

Hi @Bradamant3, We would love for you to be part of this. There's a special place in my heart for technical writing since it's key to breaking down barriers for users to use your software and critical for onboarding developers 🌻

Please feel free to ask questions (large or small). Looking forward to you helping out.

willingc avatar Jul 28 '16 02:07 willingc

My bytecode chapter for the Architecture of Open Source Applications book might be a better example than my bytecode talk (although the content substantially overlaps): http://aosabook.org/en/500L/a-python-interpreter-written-in-python.html

One of the hardest parts of writing the chapter was getting clarity about the intended audience and the intended goal. I eventually settled on an audience of Python programmers who want to learn more about the language for its own sake. (I personally don't think knowing about the guts of the interpreter helps you write better Python, but I do think it helps you deepen your knowledge of computer science and is darn fun.)

Given that, for this project, who's the intended audience, and what's the goal? What is the reader hoping to get out of the guides?

Some possible goals I can imagine:

  • Onboarding people who want to be core developers
  • Providing shortcuts, pointers, and hints for people who want to contribute to CPython but haven't worked with a large codebase before
  • Satisfying the curiosity of people who want to know how stuff works (I heard someone describe this recently as "Be the Julia Evans you want to see in the world")
  • Teaching people fundamental programming & debugging skills (e.g. Guido's last two bullets above)

We don't need perfect agreement on the goals, but I think some clarity early on will pay off significantly later on!

With more clarity on the goals, I'd be happy to contribute to this. I think the object model and garbage collection would be particularly rich topics.

akaptur avatar Jul 28 '16 21:07 akaptur

@akaptur and all: Just some preliminary thoughts as I look superficially through SO MANY AMAZING RESOURCES and read Allison's brilliant questions about audience: It seems to me that Allison's goals are fundamentally all compatible. Folks who want to be core developers may not have worked with a large codebase, they may have found the Python community to be as amazingly welcome as I have and therefore find it the right place to satisfy their curiosity -- and addressing all the above can also help teach those crucial fundamental programming and debugging skills. (These are all personal goals of mine, and I have zero aspirations toward becoming a professional developers. Just sayin'. There's a huge contribution to make to the world by making it easier for as many people as possible to understand these things.) +10K to object model and garbage collection (as someone who has far more to learn about both than she currently understands) also +10K to @jackiekazil's suggestion about framing out by core devs and then letting the rest of us adjacently-obsessed have at it.

Bradamant3 avatar Jul 28 '16 22:07 Bradamant3

+10k as well to all the above :-) once a more definitive outline is settled I would love to step in and help. @jackiekazil + @willingc I'll defer to you two for directions.

lorenanicole avatar Jul 28 '16 22:07 lorenanicole

Nice to see @akaptur here. Hope to see you at PyBay.

One of the things that I have wanted to do for over a year is to edit down @pgbovine's great 10 hour course http://www.pgbovine.net/cpython-internals.htm that walked through the Python2 codebase. I wanted to create a series of Jupyter notebooks that would correspond to each of his videos and link to source code. Here's a link to one of the notebooks that I started last year: https://github.com/willingc/pyladies-cpython/blob/master/Notes%20on%20Lecture%201.ipynb

I'm wondering (if @pgbovine is cool with us referring to the content and @gvanrossum and @brettcannon see value) if we could use the @pgbovine course as a starting point and outline linking out to other resources and docs as appropriate. Perhaps creating a study group to do two lectures a month or something.

This is everyone's group so no need to defer to me. If you have a good idea, please run with it. Personally, I want to see something that digs into the guts and I see some value in the video lecture combined with written word.

Overall, I'm better on code guts than usability so @Bradamant3's talent and insights are very valued. Thoughts?

P.S. @akaptur I think your bullet list covers the audience.

willingc avatar Jul 28 '16 23:07 willingc

Also leaving for Japan for a long overdue vacation. @jackiekazil, would you mind working with @estherbester and @audreyr to get yourself and others push access to this repo? I tried to add folks last night but needed org access not just repo admin access.

willingc avatar Jul 28 '16 23:07 willingc

I am wiling to contribute! Thanks

annakoppad avatar Jul 29 '16 00:07 annakoppad

My only worry with videos that are tied to something like the internals of CPython is they will become outdated and videos are nowhere near as easy to update as written documentation. It's one thing for a professor to record lectures he's going to give anyway, and it's another to get a volunteer to re-record an entire video for some key type just because we tweaked some detail in a release. Now that's not to say people couldn't do videos, but I think they should be entirely ancillary to anything written and simply something that could be pointed to instead of building something around the videos.

brettcannon avatar Jul 29 '16 02:07 brettcannon

Thanks for the kind words! Yes, you have my permission to use and remix the content as you wish. Please add a link back to the original source material webpage ( http://www.pgbovine.net/cpython-internals.htm ) in the appropriate place(s). Best wishes.

pgbovine avatar Jul 29 '16 03:07 pgbovine

Quick poll now that Brett has brought it up... Do people here prefer video lectures or text? I personally strongly prefer text, but my ways of learning were set long before online video was a thing.

gvanrossum avatar Jul 29 '16 04:07 gvanrossum

@gvanrossum it depends.

jackiekazil avatar Jul 29 '16 04:07 jackiekazil

Video sounds quite exciting however I think that's a stretch goal? Let's start with text then we can regroup and see about video.

lorenanicole avatar Jul 29 '16 12:07 lorenanicole

Text is fine by me. Let's proceed forward with an outline. I'm really pleased to see the interest building :smile:

If I have time when I return from Japan, I may do the Jupyter notebooks as a work related thing since there is no additional video production needed. Something that could be used with some Jupyter/JupyterHub work that I'm doing for education.

willingc avatar Jul 29 '16 14:07 willingc

OK, so with a general agreement to start out with a tutorial document, what's the next step? Outline? Basically this sounds like it's going to be a mix of "this is how Python works to help you debug problems", and "this is how Python is structured to help you find your way around".

If we need a jumping-off point then simply starting with explaining what's in all the top-level directories of a source checkout is as good a place as any. That can feed into navigating around, e.g. you can always use a ^ regex anchor on a function name in C when searching in the source to find its definition thanks to how we format C code. Then once we think we're done explaining how Python is structured code-wise we can then start talking about how stuff works to help with debugging.

brettcannon avatar Jul 29 '16 16:07 brettcannon

I just got off the phone with @emilyemorehouse who is also interested in this issue. I promised her one specific deliverable: an overview of what's in the most important top-level directories and the main entry points underneath there. I will spend some time today writing up what I can.

While we're collecting useful links, maybe my old Python history blogs could be of some use. There's a wealth of technical information there: http://python-history.blogspot.com/. A few highlights:

  • http://python-history.blogspot.com/2013/11/the-history-of-bool-true-and-false.html http://python-history.blogspot.com/2013/11/story-of-none-true-false.html
  • http://python-history.blogspot.com/2010/06/method-resolution-order.html
  • http://python-history.blogspot.com/2010/06/inside-story-on-new-style-classes.html
  • http://python-history.blogspot.com/2009/04/metaclasses-and-extension-classes-aka.html
  • http://python-history.blogspot.com/2009/03/dynamically-loaded-modules.html

gvanrossum avatar Jul 29 '16 17:07 gvanrossum

@brettcannon I think that an outline is a great next step. What's the best way for us to do this?

I've started on a set of docs based on some of @gvanrossum's direction that I've divided into a few sections -- resources (with tracking on what has been scoured for information), notes on meetings/project goals, notes from resources, and notes from digging into CPython itself. I've currently only gone through @akaptur's Bytecode talk but I've got a decent list of resources.

My personal goal from this is to be able to gain a strong understanding of the codebase in order to work towards becoming a core contributor, so I'm certainly going to lean more towards documenting how Python and its code works.

I'm more than happy to share all of my docs, both here and from a personal repo as not all notes will be entirely relevant. @jackiekazil, I can start adding if you give me access.

emilyemorehouse avatar Jul 29 '16 17:07 emilyemorehouse

@emilyemorehouse Without settling on an overall direction exactly, I don't know if we could do an outline, hence why I suggested we just start with an overview of the directories and see where it takes us. Part of the problem is I don't know what does or doesn't need to be covered for a new contributor as I have 13 years of knowledge which completely separates me from a beginner's perspective of what's not obvious. Otherwise I would just start from what it takes for someone to fix a bug and trying to cover each of the steps (navigating the code, how Python works to understand how to diagnose, and how best to write C code).

brettcannon avatar Jul 29 '16 18:07 brettcannon

As the newest newcomer, but with some background in helping get similar projects off the ground, may I make a suggestion? It seems to me that some of us could work from the docs that @emilymorehouse has offered to share, to come up with an outline (and as we see fit, work from other resources already listed as well). One item in that outline would, of course, be the overview of the directories that @brettcannon and @gvanrossum (and really everyone) have called for. So ... consider the directories overview and the outline perhaps as two branches of the project? (Not suggesting a repo structure, just a metaphor :-) ) Again as that newcomer, I'd look for more than one entry point. My tendency is to go for the big picture -- give me an overview of the code structure and I'll start rummaging (looking only) to familiarize myself with the forest map as a whole. Only later will I start looking for information about "what it takes for someone to fix a bug" (after all, someone has to learn to identify and prioritize said bugs first, right?). But others might want to dive in and find a nice tree or two to work with first. I think there's enough direction in this issue discussion already for work to begin on both outline and overview.

Bradamant3 avatar Jul 29 '16 19:07 Bradamant3

Ladies and gentlemen, I have something to share.

I spent a few hours writing a rambling draft that walks you through what happens when Python starts up, from main() to the >>> prompt. It's a work in progress, but I'm sharing it here early in the hope that it's already useful. I suppose I should eventually post this as a series of blog posts -- or we can work together on turning this into a useful and more structured document to be committed in the pyladies-maintainers repo. Thoughts?

For now I wrote it using a new Dropbox product, "Paper".

READ THIS BEFORE YOU CLICK ON THE LINK: Because Paper lets you comment and edit, it will also reveal your identity to others viewing the document, even if you're just viewing yourself, unless you use anonymous browsing.

https://paper.dropbox.com/doc/Yet-another-guided-tour-of-CPython-XY7KgFGn88zMNivGJ4Jzv

gvanrossum avatar Aug 01 '16 17:08 gvanrossum

@gvanrossum Thanks for sharing this early. It's wonderful. I really like the conversational style. I love the look and readability in Paper too.

Huge +1 to adding this to this repo in any form that you and others like. Thank you for doing this, Guido.

willingc avatar Aug 01 '16 22:08 willingc

Hello,

I have received a notification from @lorenanicole about this thread because I am interested with this topic.

In fact, I have presented my talk https://speakerdeck.com/matrixise/exploring-our-python-interpreter at EuroPython 2016, PythonFOSDEM 2016, PyCon.CA 2015 and PyCon.IE 2015. This talk has been shared on the core-mentorship mailing list. If you are on this mailing list, here is the link to my message: https://mail.python.org/mailman/private/core-mentorship/2015-November/003274.html

I have already discussed with Victor Stinner (@haypo) about an eventual book on the topic where I would like to present the internals parts of CPython, and he is interested.

In fact, with my talk, I have observed that some people ask me how to start with the contribution to CPython (ok, I am not a very active contributor, but I try in function of my time) and they think that a good introduction or just a good "book" is a good starter.

For my part, I wanted to start the project during August, just after EuroPython with some articles or maybe with a GitBook, trying to work on a Table of Content and after that the content.

So, I am interested because I want to become a contributor of CPython, fix some issues in the interpreter, learn for me and explain to everybody.

matrixise avatar Aug 03 '16 23:08 matrixise