gpython icon indicating copy to clipboard operation
gpython copied to clipboard

Support for newer python syntax

Open adam-azarchs opened this issue 3 years ago • 6 comments

Our team would find it useful if this package could parse an AST from code conforming to the python 3.8 grammar.

I understand that actually supporting newer python features would be a huge project. For my use case, I'm using this project to parse python code to work with the AST but don't need to actually interpret it, which is a much more minor proposition. As far as I know, python 3.8 syntax is completely backwards compatible with 3.4 (avoiding the whole walrus operator thing with python 3.9) and wouldn't even need to alter the internal representation of the AST (much, anyway).

The specific syntax changes our team keeps running into problems with are trailing commas on argument lists in some situations (e.g. after **kwargs) and annotations on local or class variables. The former is obviously trivial, the latter still wouldn't need any changes outside the grammar if you don't mind just throwing out the annotation information.

I'll be making these changes for our internal usage; my question is mostly whether there would be interest in a PR to upstream those changes, in which case I might be a bit less "quick and dirty" with my implementation.

adam-azarchs avatar Feb 02 '21 00:02 adam-azarchs

I think being able to parse 3.8 syntax would be great and a welcomed pr.

sbinet avatar Feb 02 '21 07:02 sbinet

Ok, I have a first pass at this in our fork at https://github.com/10XGenomics/gpython/pull/1. As it stands, it solves the problems we've run into, however because of the way the unit tests in the repo work, directly comparing the output of dumping the AST from cpython vs gpython, and because I needed to run the tests comparing to 3.8 because there are syntactic differences (mostly things that used to be illegal which are now legal), my changes had to propagate further than I'd originally hoped, and unsurprisingly many of the unit tests now fail.

I'll be going on leave for a month starting next week, and it is unlikely I'll be able to resolve those before then. @sharad3142 may have some time to work further on it during that time, though to be honest it's going to be hard for us to justify putting all that much effort into fixing what we broke in e.g. compile because we don't actually need that functionality. If you have any suggestions for an easy way to back out some of the ways things got broken, that would help.

adam-azarchs avatar Feb 04 '21 22:02 adam-azarchs

This is a valiant effort :-)

Quite a lot of the tests seem to be failing with

        compile_test.go:122: ./diffdis.py returned error: exit status 126
       /usr/bin/env: python3.8: Permission denied

Indicating something wrong with the install of python 3.8

Probably what we should do is when you've had enough we merge this to a py3.8 branch then other people can work on fixing stuff.

I have a nasty feeling that the changes will propagate through compile too....

ncw avatar Feb 05 '21 18:02 ncw

I've been using python3.8 from another location locally. I'll see if I can get that resolved in the CI.

Yes these changes will propagate through compile as well, unfortunately. In particular, where the AST used to assume that call objects would have exactly zero or one each of *args and **kwargs, and that they'd be at the end of the argument list, that is no longer the case with python 3.8 (see examples in the tests), which is going to add quite a bit of complication. Less significantly, but still relevant, the asdl no longer treats NamedConstant, Str, Bytes, Num separately - they're all just Constant now. Perhaps the other new features, like async / await and "named expressions" can just be ignored / unsupported for now, but those other representation changes need to be dealt with.

adam-azarchs avatar Feb 05 '21 19:02 adam-azarchs

I've fixed up the travis script to properly put the locally-built python3.8 in the PATH, so you can see all of the many resulting failures in all of their glory. I'd also made changes to many of the tests to make it continue with more test cases after encountering some kinds of errors, and, in a few places, surface a bit more context for debugging. The largest category of error seems to be

SyntaxError: 'can use starred expression only as assignment target'

coming from gpython; as I previously said this is most likely due to the changes in how *args are handled. Opcodes are also changed, of course.

As I said earlier, I'll be on leave for 1 month starting next week, and then a different high-priority project when I get back, so I don't expect to do any further work on this until April at least. I don't have the ability to push that branch to this repo (pull requests must be made to an existing branch) but I obviously have no objections to other people taking that branch and running with it in the mean time.

adam-azarchs avatar Feb 06 '21 03:02 adam-azarchs

SyntaxError: 'can use starred expression only as assignment target'

I'll post this here too: https://github.com/10XGenomics/gpython/commit/9c8103a92772502e7391d9cd8dba0b9a38659c9b

Managed to get to the bottom of this. You had changed the order in which we push args to the stack. Just need to make sure the kwargs and varargs stuff is last.

Tatskaari avatar Apr 10 '22 19:04 Tatskaari