RefDiff icon indicating copy to clipboard operation
RefDiff copied to clipboard

New parser for python

Open Jirigesi opened this issue 4 years ago • 12 comments

Hello, Thanks for providing such a great tool. However, I want to use a similar tool on python code. I tried my best and did not find any. Is it possible that you can give me some guide to let me write a parser for python code?

Best

Jirigesi avatar Jun 29 '20 00:06 Jirigesi

The README says that soon a detailed tutorial will be provided, looking forward to it!

Symbolk avatar Dec 03 '20 08:12 Symbolk

The README says that soon a detailed tutorial will be provided, looking forward to it!

Any news on that? Thanks.

ldesi avatar Feb 01 '21 17:02 ldesi

Hi @ldesi and @Symbolk. I create a parser for Go. Maybe, it can be used to create a generic parser. The Go parser converts a file to a JSON input, and this output is used to create the RefDiff CST.

I think it may be used to python:

Click to expand!

[
	{
		"type": "File",
		"start": 0,
		"end": 203,
		"line": 1,
		"has_body": true,
		"name": "types.go",
		"namespace": "",
		"parent": null,
		"tokens": [
			"0-7",
			"8-16",
			"16-17",
			"18-22",
			"23-31",
			"32-35",
			"35-36",
			"36-40",
			"41-49",
			"50-54",
			"55-61",
			"61-62",
			"62-63",
			"63-64",
			"65-69",
			"70-71",
			"73-81",
			"85-86",
			"86-87",
			"87-90",
			"90-91",
			"92-103",
			"104-105",
			"105-106",
			"106-112",
			"112-113",
			"114-115",
			"126-132",
			"132-133",
			"133-134",
			"134-135",
			"136-138",
			"148-157",
			"158-159",
			"162-163",
			"163-164",
			"164-165",
			"166-169",
			"169-170",
			"171-172",
			"172-173",
			"173-174",
			"174-175",
			"176-180",
			"181-182",
			"182-190",
			"190-191",
			"192-196",
			"196-197",
			"197-198",
			"199-200",
			"202-203",
			"203-204"
		],
		"receiver": null
	},
	{
		"type": "Type",
		"start": 23,
		"end": 35,
		"line": 3,
		"has_body": false,
		"name": "IntAlias",
		"namespace": "",
		"parent": "types.go",
		"receiver": null
	},
	{
		"type": "Type",
		"start": 41,
		"end": 63,
		"line": 4,
		"has_body": false,
		"name": "ChanType",
		"namespace": "",
		"parent": "types.go",
		"receiver": null
	},
	{
		"type": "Type",
		"start": 73,
		"end": 90,
		"line": 7,
		"has_body": false,
		"name": "IntSlice",
		"namespace": "",
		"parent": "types.go",
		"receiver": null
	},
	{
		"type": "Type",
		"start": 92,
		"end": 112,
		"line": 8,
		"has_body": false,
		"name": "StringSlice",
		"namespace": "",
		"parent": "types.go",
		"receiver": null
	},
	{
		"type": "Struct",
		"start": 126,
		"end": 134,
		"line": 9,
		"has_body": true,
		"name": "A",
		"namespace": "",
		"parent": "types.go",
		"receiver": null
	},
	{
		"type": "Interface",
		"start": 148,
		"end": 172,
		"line": 10,
		"has_body": false,
		"name": "iA",
		"namespace": "",
		"parent": "types.go",
		"receiver": null
	},
	{
		"type": "Function",
		"start": 162,
		"end": 169,
		"line": 11,
		"has_body": false,
		"name": "A",
		"namespace": "iA.",
		"parent": "iA",
		"receiver": null
	},
	{
		"type": "Function",
		"start": 176,
		"end": 203,
		"line": 15,
		"has_body": true,
		"name": "Test",
		"namespace": "IntSlice.",
		"parent": "IntSlice",
		"receiver": "IntSlice"
	}
]

rodrigo-brito avatar Feb 02 '21 03:02 rodrigo-brito

Hi @rodrigo-brito, we are working on a graduation project and in one part of the project we need to use a refactoring tool such as RefDiff. The problem is that we need it for Python. So I wanted to ask you, how hard is it to create a RefDiff plugin for Python such as the one you created for Go?

Mosallamy avatar Feb 07 '21 13:02 Mosallamy

Hi @Mosallamy, I spent one month creating the plugin. This week, I will try to create a short tutorial to help the other developers in plugin creation. But the main effort is to create an AST parser to extract the main components of a python file. For example, for the given file (example.py located in my_package):

def foo(x):
    print("x = ", x)

def bar():
    foo(10)

You should return a structure like this:

[
  {
    "type": "File",
    "start": 0,
    "end": 50,
    "line": 1,
    "has_body": true,
    "name": "example.py",
    "namespace": "my_package",
    "parent": null,
    "tokens": [
      "0-4",
      "5-8",
      ...
    ],
  },
  {
    "type": "Function",
    "start": 23,
    "end": 35,
    "line": 1,
    "has_body": true,
    "name": "foo",
    "namespace": "my_package",
    "parent": "example.py",
    "parameters": ["x"],
    "calls": []
  },
  {
    "type": "Function",
    "start": 36,
    "end": 50,
    "line": 5,
    "has_body": true,
    "name": "bar",
    "namespace": "my_package",
    "parent": "example.py",
    "parameters": [],
    "calls": ["my_package.foo"]
  },
]

The start and end values are just an example, it is not the correct position. But in summary:

  • For each node, you must extract the position of the token, line number, and parent node.
  • In the case of functions, you must also extract the parameters and local function calls (e.g foo() in bar function)

If we have this information, we can create the plugin. Do you have experience with python AST?

rodrigo-brito avatar Feb 07 '21 13:02 rodrigo-brito

@rodrigo-brito Thanks for the fast reply! We have experimented a little with the built in Python AST library.

https://docs.python.org/3/library/ast.html

Form the AST library we can extract the following information:

  • Function names
  • Function calls
  • Body of the function
  • Function parameters and other informations, can we use this library as the base for the Python plugin?

As for the Tokens, we've found the following library https://asttokens.readthedocs.io/en/latest/user-guide.html, which returns the positions of tokens

Mosallamy avatar Feb 07 '21 14:02 Mosallamy

@Mosallamy, I can help you with the code. Can you open a new repository for it? We can use Jython to create the parser and integrate it directly in Java module.

rodrigo-brito avatar Feb 08 '21 12:02 rodrigo-brito

Hey @rodrigo-brito, I just created a repo with a script that parses a python file and extract the following information from any function:

  • type
  • name
  • paramaters
  • line
  • start token
  • end token

Run the Ast.py file to get the output

Mosallamy avatar Feb 10 '21 11:02 Mosallamy

Hi @Mosallamy, can you share the repository link?

rodrigo-brito avatar Feb 10 '21 11:02 rodrigo-brito

Hello @rodrigo-brito, until now we have extracted all of the information out of the AST except for the function calls. Also we have thoroughly read the RefDiff paper and understood the steps required to create a plugin, but we had a problem understanding the exact implementation of the code 😅

Mosallamy avatar Feb 15 '21 11:02 Mosallamy

Hi @Mosallamy, I will try to create the base of the plugin today. I will open a pull request in your repository soon.

rodrigo-brito avatar Feb 15 '21 12:02 rodrigo-brito