godot Add script formatter, comment parsing in parser, and editor settings

This PR builds upon and finalizes the work done in godotengine/godot/pull/55835.

This PR introduces a GDScript formatter in Godot's script editor. Users can access it through Edit/Format Code, Alt+Shift+F, or by enabling the new Format On Save editor setting while saving their work.

Integrating a formatter in Godot's script editor improves it as an IDE for GDScript. Additionally, the formatter will improve developer's adherence to the official GDScript style guide.

We encourage users to test the formatter on large code bases in order to detect any quirks or bugs that need to be addressed.

Additionally, this PR includes support for comments through a header and inline. During parsing, each comment of a comment block are added to a comment header vector. The next node (variable, function, annotation) has this vector set as the header comments. The following comment until a newline, if present, is set as the inline comment.

The commits have been kept separate for ease of review, but they will be squashed before merging. Lastly, there are no new dependencies included in this PR.

Production edit: Closes https://github.com/godotengine/godot-proposals/issues/3630

Apr 18 '23 14:04 RolandMarchand

Thanks for looking into this!

The commits have been kept separate for ease of review

This PR contains only one commit.

Apr 18 '23 15:04 YuriSizov

Just tested it on a number of files, here are snippets causing unexpected results. I'll post one code pattern/snippet/group of related snippets with before/after comparisons per comment.

I'm testing 2 things:

Formatting gives the expected output.
Formatting is stable: running the formatter on already formatted code doesn't make any changes.

The first case involves signal connections with a function literal (it may just be the inner block with a parenthesis at the end of the last line)

func _ready() -> void:
	energy_slider.value_changed.connect(func(value):
		light.energy = energy_slider.value)

Gets formatted into this:

func _ready() -> void:
	energy_slider.value_changed.connect(func(value):
		light.energy = energy_slider.value
)

This happens differently as you duplicate the two lines with the function literal.

This:

func _ready() -> void:
	energy_slider.value_changed.connect(func(value):
		light.energy = energy_slider.value)

	height_slider.value_changed.connect(func(value):
		light.height = height_slider.value)

	height_slider.value_changed.connect(func(value):
		light.height = height_slider.value)

Alternates between the following and the starting state when running the formatter.

func _ready() -> void:
	energy_slider.value_changed.connect(func(value):
		light.energy = energy_slider.value
)

	height_slider.value_changed.connect(func(value):
		light.height = height_slider.value
)

	height_slider.value_changed.connect(func(value):
		light.height = height_slider.value)

Apr 18 '23 18:04 NathanLovato

Longer annotations get placed on their own lines, unlike shorter ones. Not a bug per say, but I'm not 100% sure if it's intentional.

@export_multiline var text := ""
@export var logo_visible := false

Becomes:

@export_multiline
var text := ""
@export var logo_visible := false

Apr 18 '23 18:04 NathanLovato

Comment inside a function gets indented based on previous block.

func popup(immediate := false) -> void:
	if not is_inside_tree():
		await ready
	
	# Force container to be its smallest size possibles
	size = Vector2.ZERO

Becomes:

func popup(immediate := false) -> void:
	if not is_inside_tree():
		await ready

		# Force container to be its smallest size possibles
	size = Vector2.ZERO

Apr 18 '23 18:04 NathanLovato

Last line in a setter function attached to a property definition gets pushed down up to two times if following an ending code block.

var text := "":
	set(value):
		text = value
		if not is_inside_tree():
			await ready
		rich_text_label.text = text

First becomes

var text := "":
	set(value):
		text = value
		if not is_inside_tree():
			await ready

		rich_text_label.text = text

And after a second run of the formatter, it becomes

var text := "":
	set(value):
		text = value
		if not is_inside_tree():
			await ready


		rich_text_label.text = text

After that it's stable with subsequent formatter runs.

Apr 18 '23 18:04 NathanLovato

Arrays struggle to wrap when combined with method calls currently. Here's a very long line to generate a random nickname:

var nickname: String = ["Brave", "Horrific", "Courageous", "Terrific", "Fair", "Conqueror", "Victorious", "Glorious", "Invicible"].pick_random() + ["Leopard", "Cheetah", "Bear", "Turtle", "Rabbit", "Porcupine", "Hare", "Pigeon", "Albatross", "Crow" ].pick_random()

The formatter doesn't manage to wrap it to stay below the max line length. It produces this:

var nickname: String = (
		["Brave", "Horrific", "Courageous", "Terrific", "Fair", "Conqueror", "Victorious", "Glorious", "Invicible"].pick_random(
			)
		+ ["Leopard", "Cheetah", "Bear", "Turtle", "Rabbit", "Porcupine", "Hare", "Pigeon", "Albatross", "Crow"].pick_random(
			)
)

Not a super important case, but each array is longer than max line length, so I'd expect them to be wrapped vertically, as the formatter does well when all you have is an array (without the method call). E.g. this

var adjectives := ["Brave", "Horrific", "Courageous", "Terrific", "Fair", "Conqueror", "Victorious", "Glorious", "Invicible"]

Becomes the following, as expected:

var adjectives := [
	"Brave",
	"Horrific",
	"Courageous",
	"Terrific",
	"Fair",
	"Conqueror",
	"Victorious",
	"Glorious",
	"Invicible",
]

Apr 18 '23 18:04 NathanLovato

Overall it's starting to work really well on our code at least. Good job!

Apr 18 '23 19:04 NathanLovato

I've been working with GDScript docstrings lately, and from a super quick look through the code, that feels relatively similar to this idea that comments get assigned to relevant members. Is the purpose of that assignment to be able to format comments associated with members?

Because if it is, I wonder whether there's some refactoring possible there. Docstrings can be considered a special case of comment (they use ## instead of a single #). Either way, that refactoring should be done in a separate PR. I am just trying to figure out whether this is being used for documentation at all, and might thus be duplicated some work already done for docstrings!

Apr 22 '23 14:04 anvilfolk

Will it be possible to call the formatter via the CMD line? Main use case would be if working with an external editor.

Apr 23 '23 07:04 monkeez

Will it be possible to call the formatter via the CMD line? Main use case would be if working with an external editor.

For external editors, the first goal should be to add formatting to the language server, which will instantly provide support for most editors (vscode, neovim, jetbrain, emacs, ...). That's definitely something to add once the formatter gets merged.

Command line support can and should probably be added for use in continuous integration and other use cases outside of an editor (e.g. batch formatting an existing project that didn´t have the formatter).

Apr 23 '23 08:04 NathanLovato

I've tested it a bit on a few of my codes. It feels really good but there are still some hiccups here are what I found.

I'm not sure the formatter should change annotation placement, I feel it should keep how you had it, especially since there is no explicit guideline on whether to put it in its separate line or not.

large number are wrongly formatted, if they are not formatted they do not change and if they are formatted every _ gets removed. See here for proper formatting of large numbers Capture d’écran 2023-04-24 à 07 10 45

Adam:

[x] fixed

when formatting

printt("a lot of arguments", "a lot of arguments", "a lot of arguments", "a lot of arguments", "a lot of arguments")

it produces this

printt(
			"a lot of arguments",
			"a lot of arguments",
			"a lot of arguments",
			"a lot of arguments",
			"a lot of arguments"
	)

which is missing the trailing coma (only tested with function but the trailing comma might be misiing in arrays and dictionaries too )

Moreover when separating into multiple line as in the last example if the line is just after a unindent than it will put two empty lines instead of the one needed. reformating will then reduce to one line.

I also had a similar issue to the one reported here https://github.com/godotengine/godot/pull/76211#issuecomment-1513655346 but it was myself with nested functions

when having multiple unindent you will have as much empty line as the number of unindent which might result in the next line being a 4 line after (strangely this does not happen in the last function of the script) moreover on reformat it will add one more empty line. This : Capture d’écran 2023-04-24 à 07 40 59 Will become this : And then this :

Apr 24 '23 05:04 ajreckof

@anvilfolk Thank you very much for the feedback!

For the annotations, @tool and @icon are applied to the parser itself, similar to a flag. There is no information in the parser as to which one came first. Other annotations do have a stack system, though. We could implement a system to keep track of those two annotations' order, but I do not think it's worth the effort.

Aside from that, yes, there are still some bugs. Thank you so much for the screenshots, I will make test cases out of them and fix them ASAP :+1:

Apr 24 '23 06:04 RolandMarchand

large number are wrongly formatted, if they are not formatted they do not change and if they are formatted every _ gets removed.

The exact formatting of the literals is lost:

-    print(0xFFFF)
-    print(1.2e-5)
-    print(0.1111111111111133) # <- Out of precision.
-    print("\t \n \uFFFD")
-    print(""" key="value" """)
+    print(65535)
+    print(0.000012)
+    print(0.11111111111111) # <- Out of precision.
+    print("	 
+ �")
+    print(' key="value" ')

Adam edit:

[x] fixed

Possible solution: when allocating a LiteralNode, copy the Token::source property into a new LiteralNode property and use it in the formatter.

I'm a bit worried about this approach (restoring source code from AST) but overall the code looks good to me, excellent work!

Now we need to make sure that the parser has all the necessary information, including the order of the elements. For example, now the formatter moves the setter before the getter, but we do not have this recommendation in the style guide (and in the GDScript reference examples, the getter is placed before the setter). I think the formatter should not change anything that is not regulated by the style guide.

1. Class doc comment moves down.

 extends Node
-## Class doc comment.
 
 
+## Class doc comment.
 ## Method doc comment.
 func test():
     pass

Adam edit:

[ ] fixed

2. A space is added before commented code.

     # Comment.
-    #print("code")
+    # print("code")

Adam edit:

[ ] fixed

3. Regression: comment now acts like statement (pass is not required).

Adam edit:

[x] fixed

4. The formatter removes unnecessary parentheses added for clarity. As far as I know, this is difficult to solve, since the parser uses parentheses for grouping, but does not create grouping nodes or otherwise store them in the AST.

-if not (x is Node) or (y >= 0 and y < 10):
+if not x is Node or y >= 0 and y < 10:

-_flags[flag / 8] &= ~(1 << (flag % 8))
+_flags[flag / 8] &= ~1 << flag % 8

Adam edit:

[x] fixed

Apr 24 '23 06:04 dalexeev

@dalexeev We should consider changing the recommended order of the class doc comment to be like this:

## Class doc comment.
class_name MyClass
extends Node

Every other doc comment, and all annotations, appear above/before what they describe, not after.

To avoid breaking compat, we could do this in steps, first support both orders for one Godot release (like 4.1), then enforce it being above for the next release (like 4.2).

Apr 24 '23 16:04 aaronfranke

@aaronfranke You are probably right, although both class_name and extends are optional (if extends is omitted, the class extends RefCounted).

Apr 24 '23 16:04 dalexeev

Fixed the issue about literals being parsed:

Capture d’écran du 2023-05-11 11-07-19_merged

Note: The picture is outdated, I modified my commit and it makes it so that strings and numbers take the source code as is. So the quotes doesn't change anymore (like on line 6)

May 11 '23 15:05 adamscott

As this is a HUGE PR, I suggest to merge it sooner than later. As there's multiple issues that have been underlined in the comments, I think it will be easier to solve them in chunks with each of them having their own issue tracked rather than trying to address them all in this PR.

So, I could rebase the commits to create one and we could merge the PR.

@akien-mga @YuriSizov @vnen I think it would be a great addition to 4.1, especially as this will let us time to fix the issues before and after the feature freeze.

May 12 '23 13:05 adamscott

I'm a bit worried about this approach (restoring source code from AST) but overall the code looks good to me, excellent work!

@dalexeev The popular code formatter Prettier for NodeJS uses the AST generated by Babel to "prettify" Javascript and Typescript code. So this is a valid way to format code.

May 12 '23 13:05 adamscott

As this is a HUGE PR, I suggest to merge it sooner than later.

If there are no regressions found in existing features (what about doc comments?), then I support this. We can warn users that formatting isn't working perfectly, but since it's off by default I don't think it's a problem, users can just not use it and nothing will change for them.

May 12 '23 13:05 dalexeev

Currently working on making the build pass.

May 12 '23 13:05 adamscott

I second a merge soon: the formatter works fine overall - remaining issues will be edge cases in code that it formats without errors or crashes, but that don't have ideal formatting in the output.

There are certainly many more to find, and finding these will involve a lot of testing by the community, running the formatter on their code. And this won't happen unless the formatter's available in a build.

May 12 '23 14:05 NathanLovato

Shoutout to @Razoric480 who did the most of the job of that PR! :partying_face:

May 12 '23 14:05 adamscott

What about points 3 and 4 in the comment? I can test this PR again when you confirm you are done.

May 12 '23 15:05 dalexeev

@dalexeev Working on point 3, the regression of the missing pass

May 12 '23 16:05 adamscott

Once everyone else's points are addressed, I can give this a test on The Mirror's codebase before merging. It's likely that I will find a bunch of issues and edge cases testing on our codebase with about 50k lines of GDScript ;)

May 12 '23 18:05 aaronfranke

Once everyone else's points are addressed, I can give this a test on The Mirror's codebase before merging. It's likely that I will find a bunch of issues and edge cases testing on our codebase with about 50k lines of GDScript ;)

@aaronfranke Feel free to test The Mirror's codebase :fire:, but as I said:

As this is a HUGE PR, I suggest to merge it sooner than later. As there's multiple issues that have been underlined in the comments, I think it will be easier to solve them in chunks with each of them having their own issue tracked rather than trying to address them all in this PR.

May 12 '23 22:05 adamscott

@dalexeev Working on point 3, the regression of the missing pass

Done.

May 12 '23 22:05 adamscott

@adamscott @NathanLovato I disagree with the "merge now, fix later" mentality. This is not just some optional feature that runs on top of GDScript ("off by default" as @dalexeev puts it), it also includes changes to the GDScript parser, which can break scripts if it's not tested.

I tested it and I found a few parser errors (without formatting), and many issues with the formatter. Some of the formatter issues I've noted below are the same issues as the ones I pointed out before, so I guess the test cases I posted about before were not re-tested with the latest version of the PR. If only a small amount of formatter bugs were there, it's fine since it's off by default, but the parser bugs must be fixed, and it would be ideal for there to be a minimal amount of formatter bugs present too.

Side note, I would greatly appreciate the ability to selectively toggle the formatter off for specific sections of the code. There are 2 use cases here. One is that the formatter may make a mistake, so you want to turn it off for that part. The other is that sometimes the most readable way to write something may be something we don't recommend, such as having many lines that exceed the length limit but are aligned with each other so it's very easy to read.

This code errors with the message "Expected end of statement after variable declaration, found "Dedent" instead."

func _ready():
	var v := Vector2()
	var a = v[0] # A

Adam edit:

[x] fixed

This code errors with the message "Expected expression as dictionary key."

func _ready():
	var dict = {
		"key1": "value1", # Comment
		# Comment
		"key2": "value2",
	}

Adam edit:

[x] fixed

The game does not run, Godot says "Parser Error: Could not resolve class ..." for some scripts and highlights the extends line in red. I suspect it's related to this error spammed in the console:

ERROR: Parser bug: Mismatch in extents tracking stack.
   at: complete_extents (modules/gdscript/gdscript_parser.cpp:4751)
ERROR: Parser bug: Mismatch in extents tracking stack.
   at: complete_extents (modules/gdscript/gdscript_parser.cpp:4751)

Adam edit:

[ ] fixed

Unique nodes lose their special syntax and are converted to using the $ operator. This:

@onready var unique_node = %UniqueNode

becomes this:

@onready var unique_node = $"%UniqueNode"

Adam edit:

[x] fixed

This code does not format right, this:

var _MAX_FILE_SIZE = 1 << 20 # 1 MiB

becomes this:

var _MAX_FILE_SIZE = (
		1
		<< 20 # 1 MiB
)

However, this one is minor, I can just move the comment to the line above to prevent this issue.

Adam edit:

[x] fixed

This code does not format right, this:

func _ready():
	var some_node
	# This bug only happens when the comment on the previous line exceeds 80 chars.
	some_node.visible = 5 > 3

becomes this:

func _ready():
	var some_node
	# This bug only happens when the comment on the previous line exceeds 80 chars.
	some_node.visible = (
			5 > 3
	)

and probably related, this:

func _ready():
	var some_node
	# This bug only happens when the comment on the previous line exceeds 80 chars.
	some_node.visible = some_node.some_func()

becomes this:

func _ready():
	var some_node
	# This bug only happens when the comment on the previous line exceeds 80 chars.
	some_node.visible = some_node.some_func(
			)

Adam edit:

[x] fixed

Comments after an indented block are forcefully moved over:

func _ready():
	if true:
		pass
	# Description of A
	var a

becomes this:

func _ready():
	if true:
		pass

		# Description of A
	var a

Adam edit:

[x] fixed

The spacing around the equal sign in function parameters is broken. This:

func my_func(optional_param = null):
	pass

becomes this:

func my_func(optional_param= null):
	pass

Adam edit:

[x] fixed

This formatter explodes (and spits out invalid GDScript code) when trying to format this GUID code:

static func generate_guid() -> String:
	var b = []
	return "%02x%02x%02x%02x-%02x%02x-%02x%02x-%02x%02x-%02x%02x%02x%02x%02x%02x" % [
			# low
			b[0], b[1], b[2], b[3],
			# mid
			b[4], b[5],
			# hi
			b[6], b[7],
			# clock
			b[8], b[9], b[10], b[11], b[12], b[13], b[14], b[15]
	]

becomes this: (btw, pretend var b is actually a 16-element array)

static func generate_guid() -> String:
	var b = []
	return (
			"%02x%02x%02x%02x-%02x%02x-%02x%02x-%02x%02x-%02x%02x%02x%02x%02x%02x"
			% # low
	# low
		[
			b[
				0
			],
			b[
				1
			],
			b[
				2
			],
			b[
				3
			],
			# mid
			b[
				4
			],
			b[
				5
			],
			# hi
			b[
				6
			],
			b[
				7
			],
			# clock
			b[
				8
			],
			b[
				9
			],
			b[
				10
			],
			b[
				11
			],
			b[
				12
			],
			b[
				13
			],
			b[
				14
			],
			b[
				15
			],
		]
	)

Adam edit:

[x] fixed

The formatter puts these on one line despite the fact that it exceeds the maximum line length. This:

func _input(input_event: InputEvent) -> void:
	if input_event.is_action(&"some_long_action_name") or \
			input_event.is_action(&"another_action_name"):
		pass

becomes this:

func _input(input_event: InputEvent) -> void:
	if input_event.is_action(&"some_long_action_name") or input_event.is_action(&"another_action_name"):
		pass

Adam edit:

[ ] fixed

The formatter breaks functions inside of inner classes followed by a variable.

class MyInnerClass:
	var hi

	func my_inner_class_function() -> void:
		pass

	func another_one() -> void:
		pass

becomes this:

class MyInnerClass:
	var hi

	


func my_inner_class_function() -> void:
		pass


	func another_one() -> void:
		pass

Adam edit:

[x] fixed

Comments above static functions are deleted on format:

## This comment will be deleted on format.
static func test():
	pass

Adam edit:

[ ] fixed

The formatter still breaks this case (multiplying the number then doing string interpolation):

func _ready():
	var number = 1234.567
	var string = "%1.1f k" % (number * 0.001)

Becomes this (breaks because it does string interpolation, then it tries to do string * float):

func _ready():
	var number = 1234.567
	var string = "%1.1f k" % number * 0.001

Adam edit:

[x] fixed

A comment at the end of a return value is deleted. This:

func test_func() -> bool:
	if true:
		return true # In this case, stop.
	return false

becomes this:

func test_func() -> bool:
	if true:
		return true
	return false

Adam edit:

[ ] fixed

The formatter wants to split up this line, but I don't think it's correct. This:

@export_exp_easing var min_value: float = 0.0

becomes this:

@export_exp_easing
var min_value: float = 0.0

Adam edit:

[x] fixed

The formatter still goes crazy when you have a comment at the end of an onready var with a get_node call. This:

@onready var a_node = $ANode
@onready var b_node = a_node.get_node(^"BNode") # Test

becomes this after several saves:

@onready var a_node = $ANode
@onready
var b_node = a_node.get_node(
		^"BNode"
) # Test # Test # Test # Test # Test # Test # Test # Test # Test # Test # Test # Test # Test # Test # Test # Test # Test # Test # Test # Test # Test # Test # Test # Test # Test # Test # Test # Test # Test # Test # Test # Test

Adam edit:

[x] fixed

I'm still getting these weird extra line diffs, but it eludes me when I try to make a minimal test case:

Really I would prefer it to just not leave empty lines after indented blocks, I don't like them. But at the very least, the formatter should not leave more than one.

May 13 '23 06:05 aaronfranke

@aaronfranke thanks for the tests and feedback, and the problem is exactly that: this PR needs extensive testing on many codebases, but new features don´t get the visibility and testing they need until they land in development or beta releases.

It's not about "merging now and fixing later." It's more about getting testing from the wider user base and code written in very different ways.

Note that Adam is working full-time on GDScript since this week, and he prioritizes and executes work very diligently, so once this gets merged, there would be rapid follow-up and bug fixing. Unfortunately, the people we sponsored before, who were doing it on the side of their full-time job, weren't able to keep up with all this project involved.

Thankfully, you did show up and took the time to test on a big codebase, so thank you kindly for that. That's really helpful.

May 13 '23 06:05 NathanLovato

I think the parser errors pointed out by @aaronfranke should ideally be solved before merging, if they can be reproduced easily, since they seem to break working codebases even when not using the formatter at all.

Further bugs caused by using the formatter can be left for separate issues to solve after merge.

May 15 '23 07:05 akien-mga