pyan Discussion - next steps?

Hi everyone assigned,

Now that we all have write access to this repo, Pyan is becoming a true community project. I thought this would be a good time to take a moment to discuss Pyan's next steps. The tool is still far from perfect, but judging by the stars on GitHub (both here and on @davidfraser's stable repo), there's community demand for this kind of static analysis tool, and it's undoubtedly already useful.

In my open-source metaprogramming efforts, I'm currently focusing on other projects, both of which have turned out as major time sinks, so I won't likely have that much time to devote to Pyan in the near future.

Nevertheless, here are some short to medium term goals off the top of my head:

#51. Python 3.8 support is important for keeping Pyan alive for the next few years. It should only require small changes. I recently did this for mcpyrate, so the AST changes are still in recent memory for me. I can do this one.
README needs updating. Maybe the TODO section should be refactored into its own file. I can do this one, too.
The function and namespace filters added by @jdb78 need documentation in the README. (Nice feature, by the way! Thanks!)
Integrating modvis as an option for the main pyan executable? It's sometimes useful to get a 30000ft (9144m) view of a codebase, looking only at module dependencies, especially when there are many small modules. So some time ago, I wrote this small, separate tool and archived it here, but it's not yet integrated with the rest of Pyan (indeed, it's not even installed with the package).
We need many more automated tests. This would also document what exactly is guaranteed to be supported, in terms of the target codebase being analyzed. But I don't have the time to write lots of test cases now. As a stopgap measure, maybe add tests when new bugs are reported?
Renaming from pyan3 to pyan now that Python 2 is gone. This would be ideal to resolve in the near future. We need to set up a new PyPI package, and a redirect for the old one that, if possible, says the package name pyan3 is deprecated and pulls in the new one. We should probably set up a pyan3 console script too, that gives a deprecation warning, and then redirects to the new pyan console script. (I haven't yet thought about how to do any of this; hence I just released 1.1.1 quickly and skipped over this.)
Let's start keeping an official CHANGELOG, and move to semantic versioning?
Opening write access for the PyPI package, how to do that? It would be nice for any of us to be able to update it when needed, rather than having to wait for me on that.

Is there something specific you'd like to add to Pyan, or is just keeping it alive enough for you for now?

Once we have a plan, setting some milestones in the issue tracker would be nice. I think the next milestone could be simply Python 3.8 support and an updated README. Thoughts?

Dec 11 '20 14:12 Technologicat

That sounds really good. I'm really glad that there are people who have taken this project on and are thinking about the future, though I don't anticipate contributing further myself significantly... thank you!

Dec 11 '20 17:12 davidfraser

Thanks for outlining action items.

I could work on renaming the package and script to pyan. This can include two scripts (as entry points to different functions), the pyan script, and the pyan3 script, which will print a UserWarning (a DeprecationWarning would remain invisible to the user). Currently, the entry pyan appears to be available on PyPI.

The pyan3 package could remain on PyPI for backward compatibility reasons. Perhaps there could be a final pyan3 release that prints a UserWarning (a DeprecationWarning would remain invisible to the user) when the script is invoked, including a link to pyan on PyPI. A less backward-compatible approach would be to remove pyan3 from PyPI, leaving it to the users to search for the package and learn its new location via GitHub.

I do not think that redirecting at installation time from pyan3 to pyan is possible on PyPI, and it would be implicit (PEP 20), installing a package other than the requested one. The README on PyPI of the last pyan3 release could mention that pyan3 is superseded by pyan, and link to the PyPI page for pyan.

Adding maintainers to a package on PyPI is possible by logging in, following the "Your Projects" link, the "Manage" link on the package of interest, and then the "Collaborators" link, and the "Invite Collaborator" section of that page (there are maintainer and owner roles as available options on PyPI).

I agree with following semantic versioning. Maintaining a CHANGELOG may require deciding what to place in the CHANGELOG. In the dd package I have maintained a CHANGELOG in the form of a file named CHANGES.md. The changes included have been organized as a Markdown list-of-lists, including mainly API changes, omitting bugs, and trying to summarize by class/method/function concerned. It appears easier to modify the file CHANGES.md in the same commits that change the code itself (thus grouping thematically relevant changes together). The rationale for not mentioning bugs and most non-API changes is to make the CHANGELOG readable as a summary of interface changes, instead of a copy of the git history. Users interested in more detail can always browse through the git history for a more detailed record of changes.

The version number could be stored in setup.py and from there automatically written to a file inside the package (instead of loading it from a file inside the package). I have followed this approach in other packages.

About testing, running tests on Travis CI (http://travis-ci.com) would also be good (travis-ci.org will be discontinued soon, migrating to travis-ci.com, as mentioned here).

Code formatting (PEP 8) could be another set of changes.

An observation about the license text is that it is formatted as Markdown. The text of the GPLv2 includes the sentences "Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed.", so the license may be identical in meaning to GPLv2, but not a verbatim copy of GPLv2. One possibility would be to change the license text to a verbatim copy of GPLv2.

The source files appear to not include any copyright notice. A copyright notice would need to be placed in each file. An example notice is included in the text of GPLv2.

Dec 13 '20 19:12 johnyf

Going forward, I believe sphinx integration could be a worthwhile project. Callgraphs are great for documentation and a sphinx plug-in would make using pyan must easier. Having said that, I am not sure on how much time I have to work on this.

On package releases, versioning and dependency management, I found the combination of poetry and github actions very useful (see for example my pytorch-forecastig package) because it completely automates everything for you, and you only have to create releases on github. However, I do not really have the time to implement it here, so please regard this as a suggestion.

Dec 13 '20 19:12 jdb78

Thanks for the comments!

To begin with, I forgot some action items:

There's significant duplication between the main function in main.py and create_callgraph in __init__.py. This should be "dried" as soon as possible.
At least personally, I prefer a style where the __init__.py of a package just re-exports the public API (and defines __version__, if applicable). The implementation of create_callgraph could live in another module, and __init__.py could then import the function. (This is cleaner if some other module internally needs to import the implementation, since it then doesn't need to depend on the top level of the package.)
We should in each module define the magic __all__, which is the PEP8 recommended mechanism to declare the public API of a module. In __init__.py, we could then use a star-import in its only acceptable role in non-interactive code - for automatically re-exporting the public API (and only the public API).

Then, to reply:

@johnyf : please, feel free to take on the pyan3 to pyan name transition. Your suggested backward-compatible solution sounds good to me. We could keep the final pyan3 package on PyPI for a year or two after the final release (giving people time to upgrade to pyan), and then delete it.

Good point about UserWarning vs. DeprecationWarning. I'm aware of the semantic difference, yet I keep forgetting it.

What I meant by automatically pulling in the new package was essentially just listing pyan as a dependency of pyan3, so the new one will get installed automatically when someone installs the old one. There are some meta-packages on PyPI that do this. The one I was specifically thinking of is ZODB3, which installs the various component packages of ZODB. But now that I think of it, that's a different use case from what we want to do. I agree on explicit is better than implicit. So let's keep it explicit.

Thanks for the PyPI instructions. I'll set up write access to the PyPI package for both of you.

Good point about CHANGELOG content. What's appropriate there depends on how clean the git history itself is, and on the commit granularity. In my own projects, I prefer to list important bug fixes, too, to save the reader the trouble of going through possibly hundreds of commits. The motivation is the same - providing a readable summary of changes. :)

About the version number, the important part is that it's specified in one place only. The two approaches reflect different philosophies, as well as behave slightly differently technically. Some variant of the packagename/__init__.py scan is what I do in my own projects. Your approach is also fine.

CI, yes, we should move toward that. (But I think we first need a lot more automated tests.)

PEP8, yes, that's good too. A suitable flake8rc and autopep8, or maybe black? (Personally, I always run flake8 with a few select warnings turned off, and I don't agree with the opinions of black; but pyan is a pretty standard Python project, and for projects that aren't too exotic, standard formatting makes sense.)

Regarding GPLv2, good catch! Probably easiest to throw out the one we currently have, and replace it with a verbatim original GPLv2.

I'd prefer to avoid the added noise of per-file copyright notices, especially if they're lengthy. (The GPLv2 license text recommends having per-file notices, but doesn't strictly require them.) The LICENSE file (and AUTHORS file, if that information can't go directly into LICENSE for legal reasons) already conveys that information. OTOH, having at least a one-line notice makes it explicit that the code isn't BSD or MIT, while saving the reader the effort to look up the license file.

So if that's ok with you, maybe one-line license notices?

@jdb78 : Sphinx integration, nice idea! Let's create an issue to track it.

I briefly looked at the workflows of pytorch-forecasting. How do you configure the secrets? In the GitHub settings page for the project, I suppose?

I suggest we set up a milestone named something like "∞", to tag ideas that are worth writing down, but aren't being worked on at the moment.

(There are multiple schools of thought on what the role of an issue tracker is, so to spell out where I'm coming from, for me it's as much a public TODO list as it is a place to report bugs and ask questions. There's nothing wrong with having many issues open, if they are basically TODO items.)

Dec 14 '20 01:12 Technologicat

Yes. Exactly, the secrets are added (mostly) added via GitHub settings. Some do exist by default though. Poetry also has a dynamic versioning package so you can version packages via tags. It makes releasing packages and managing dependencies pretty easy in my experience.

Dec 14 '20 10:12 jdb78

Adding a sphinx extension here: #54 Was far easier than thought and could be a game changer.

Jan 05 '21 15:01 jdb78