RefDiff
RefDiff copied to clipboard
New parser for python
Hello, Thanks for providing such a great tool. However, I want to use a similar tool on python code. I tried my best and did not find any. Is it possible that you can give me some guide to let me write a parser for python code?
Best
The README says that soon a detailed tutorial will be provided, looking forward to it!
The README says that soon a detailed tutorial will be provided, looking forward to it!
Any news on that? Thanks.
Hi @ldesi and @Symbolk. I create a parser for Go. Maybe, it can be used to create a generic parser. The Go parser converts a file to a JSON input, and this output is used to create the RefDiff CST.
I think it may be used to python:
- You need to define the node types
- Define the python extensions
- Create a simple AST parser for python. Example of output format:
Click to expand!
[
{
"type": "File",
"start": 0,
"end": 203,
"line": 1,
"has_body": true,
"name": "types.go",
"namespace": "",
"parent": null,
"tokens": [
"0-7",
"8-16",
"16-17",
"18-22",
"23-31",
"32-35",
"35-36",
"36-40",
"41-49",
"50-54",
"55-61",
"61-62",
"62-63",
"63-64",
"65-69",
"70-71",
"73-81",
"85-86",
"86-87",
"87-90",
"90-91",
"92-103",
"104-105",
"105-106",
"106-112",
"112-113",
"114-115",
"126-132",
"132-133",
"133-134",
"134-135",
"136-138",
"148-157",
"158-159",
"162-163",
"163-164",
"164-165",
"166-169",
"169-170",
"171-172",
"172-173",
"173-174",
"174-175",
"176-180",
"181-182",
"182-190",
"190-191",
"192-196",
"196-197",
"197-198",
"199-200",
"202-203",
"203-204"
],
"receiver": null
},
{
"type": "Type",
"start": 23,
"end": 35,
"line": 3,
"has_body": false,
"name": "IntAlias",
"namespace": "",
"parent": "types.go",
"receiver": null
},
{
"type": "Type",
"start": 41,
"end": 63,
"line": 4,
"has_body": false,
"name": "ChanType",
"namespace": "",
"parent": "types.go",
"receiver": null
},
{
"type": "Type",
"start": 73,
"end": 90,
"line": 7,
"has_body": false,
"name": "IntSlice",
"namespace": "",
"parent": "types.go",
"receiver": null
},
{
"type": "Type",
"start": 92,
"end": 112,
"line": 8,
"has_body": false,
"name": "StringSlice",
"namespace": "",
"parent": "types.go",
"receiver": null
},
{
"type": "Struct",
"start": 126,
"end": 134,
"line": 9,
"has_body": true,
"name": "A",
"namespace": "",
"parent": "types.go",
"receiver": null
},
{
"type": "Interface",
"start": 148,
"end": 172,
"line": 10,
"has_body": false,
"name": "iA",
"namespace": "",
"parent": "types.go",
"receiver": null
},
{
"type": "Function",
"start": 162,
"end": 169,
"line": 11,
"has_body": false,
"name": "A",
"namespace": "iA.",
"parent": "iA",
"receiver": null
},
{
"type": "Function",
"start": 176,
"end": 203,
"line": 15,
"has_body": true,
"name": "Test",
"namespace": "IntSlice.",
"parent": "IntSlice",
"receiver": "IntSlice"
}
]
Hi @rodrigo-brito, we are working on a graduation project and in one part of the project we need to use a refactoring tool such as RefDiff. The problem is that we need it for Python. So I wanted to ask you, how hard is it to create a RefDiff plugin for Python such as the one you created for Go?
Hi @Mosallamy, I spent one month creating the plugin. This week, I will try to create a short tutorial to help the other developers in plugin creation. But the main effort is to create an AST parser to extract the main components of a python file. For example, for the given file (example.py located in my_package):
def foo(x):
print("x = ", x)
def bar():
foo(10)
You should return a structure like this:
[
{
"type": "File",
"start": 0,
"end": 50,
"line": 1,
"has_body": true,
"name": "example.py",
"namespace": "my_package",
"parent": null,
"tokens": [
"0-4",
"5-8",
...
],
},
{
"type": "Function",
"start": 23,
"end": 35,
"line": 1,
"has_body": true,
"name": "foo",
"namespace": "my_package",
"parent": "example.py",
"parameters": ["x"],
"calls": []
},
{
"type": "Function",
"start": 36,
"end": 50,
"line": 5,
"has_body": true,
"name": "bar",
"namespace": "my_package",
"parent": "example.py",
"parameters": [],
"calls": ["my_package.foo"]
},
]
The start and end values are just an example, it is not the correct position. But in summary:
- For each node, you must extract the position of the token, line number, and parent node.
- In the case of functions, you must also extract the parameters and local function calls (e.g foo() in bar function)
If we have this information, we can create the plugin. Do you have experience with python AST?
@rodrigo-brito Thanks for the fast reply! We have experimented a little with the built in Python AST library.
https://docs.python.org/3/library/ast.html
Form the AST library we can extract the following information:
- Function names
- Function calls
- Body of the function
- Function parameters and other informations, can we use this library as the base for the Python plugin?
As for the Tokens, we've found the following library https://asttokens.readthedocs.io/en/latest/user-guide.html, which returns the positions of tokens
@Mosallamy, I can help you with the code. Can you open a new repository for it? We can use Jython to create the parser and integrate it directly in Java module.
Hey @rodrigo-brito, I just created a repo with a script that parses a python file and extract the following information from any function:
- type
- name
- paramaters
- line
- start token
- end token
Run the Ast.py file to get the output
Hi @Mosallamy, can you share the repository link?
Hello @rodrigo-brito, until now we have extracted all of the information out of the AST except for the function calls. Also we have thoroughly read the RefDiff paper and understood the steps required to create a plugin, but we had a problem understanding the exact implementation of the code 😅
Hi @Mosallamy, I will try to create the base of the plugin today. I will open a pull request in your repository soon.