Gut icon indicating copy to clipboard operation
Gut copied to clipboard

Add a way to declare test dependencies

Open tomsterBG opened this issue 3 months ago • 8 comments

Versions

4.4+

The Feature

A simple (on the surface, not in code) feature where you would overwrite a virtual method, maybe a new one, maybe before_all and declare a dependency, meaning that this test will not run until the test it depends on runs first.

The problem

I believe people with experience explain it better than me, so read here: https://www.factorio.com/blog/post/fff-366

However i experience the following structure in my project: https://github.com/tomsterBG/godot-classes-unit-tested

Helpers - Helper functions. Health - Must know that Helpers works correctly because it uses features from it. HealthPlus - An experimental extension that must know if Health works and if Helpers works.

The current way to solve this is by ordering the scripts in a pre-commit hook and that's fine and all, but problems arise when you scale up.

There is ambiguity, why is the order like that? What if i accidentally switch the order? What if two scripts require the same script and nothing else? There's no way to say "these both are on the same exact order", you always have to put one above or below the other.

With declaring test dependencies from within a test itself, you open up a way to be less ambiguous about your intentions.

Think of it like #include in C++ for example and this guy talking about how transitive includes are a bad idea: https://github.com/godotengine/godot-proposals/discussions/9016

Or like Lua's require(module_script).

This is just a way to be more descriptive and explicit rather than ambiguous and implicit, about the order in which you want your tests to execute. And as the Factorio devs make it clear, order matters. Say my HealthPlus fails, but it may fail because of an error in Helpers. I would only know for sure if Helpers is the problem if i read all errors and go through each one, or to make the process more efficient, if i order my tests in a way that accurately represents which code depends on which other code.


If there's already a way to do this that is better than script order hooks please let me know. Or maybe i'm missing something and there's already a way to write a custom hook that follows this exact behavior.

Whatever the solution might be, let's figure it out!

tomsterBG avatar Aug 26 '25 11:08 tomsterBG

I like this idea, and that factorio link was a good read. Technically, I think this is possible now. I think a formal solution would be great to have and far easier to use.

How you might do it now

This is a thought experiment. I think this would work. I didn't try it out. If nothing else, it's a list of things a feature would include.

You could use a combination of a pre-run hook and should_skip_script to get this done. It does require using some parts of GUT that aren't "meant for public consumption", but I don't foresee any of these changing (famous last words).

Reordering Tests

This is how you would reorder the tests array. How you would determine the order is covered later.

In the pre-run hook you have access to the gut instance that will run the tests. You can use gut.get_test_collector() to get the object that holds all the tests to be run. The returned test_collector.gd holds an array of collected_script in the scripts property. You could reorder this array any way you want. I think this as far down as you want to go, but if you wanted to reorder the tests in a script you could mess with the collected_script.tests array which holds collected_test instances.

Defining Dependencies

We could add a const to the script, we'll call it const DEPENDENCIES. GUT instantiates test scripts right before they run, so we don't have an instance in the pre-run hook. Consts are accessible without an instance. You can use a_collected_script.load_script().get_script_constant_map() to check if the script has the expected const and get the values.

You would need another const to define the category for the script. We'll call it const CATEGORY.

Both constants would be optional.

The values of these constants could be defined in the DependencyMgr below.

Dependency Manager

Making a static class (DependencyMgr) will make it easier to check in from test scripts.

Create the dependency tree (in the pre-run hook).

Sounds hard. Might be easy. It will need to worry about circular dependencies. It should validate DEPENDENCIES and CATEGORY constants in scripts. It would also create a mechanism to look up the collected_script instances by CATEGORY.

Reorder the scripts to be run based on the tree (in the pre-run hook).

Probably easy after the dependency tree is resolved.

Have a way for a script to check if its dependencies have all passed.

We'll call this DependencyMgr.are_my_dependencies_failing(gut_test_instance). Since we've reordered the tests already, we can assume all dependencies have been run. If we assume that, we can check for failures (and not pass count) which helps when we are running a subset of tests. This would look up all the collected_script instances per the DEPENDENCIES values (if they exist) and call get_fail_count on them.

This would return false if everything is ok or a string listing the failing dependencies (more on this later).

Skipping scripts with failing dependencies

In each script, and Inner Test Class, we can use should_skip_script to skip the script if dependencies are not passing. This will mark the script as "risky" in the summary. should_skip_script will skip the script if it returns anything other than false. When a string is returned it displays it.

We can define the method in your own GutTest subclass that other tests can inherit from and get this for free. You could also just add the method to all your scripts.

class_name DependencyTest
extends GutTest

func should_skip_script():
    return DependencyMgr.are_my_dependencies_failing(self)

An example test would look like:

extends DependencyTest

const DEPENDENCIES = [DependencyMgr.CATEGORY_1, DependencyMgr.CATEGORY_2, DependencyMgr.CATEGORY_3]
const CATEGORY = DependencyMgr.CATEGORY_8

Gotchas

  • circular dependencies
  • Inner Test classes need their own constants, which could be tedious. It's possible to make the DependencyMgr apply script level constants to Inner Test Classes. That has it's own set of problems though.
  • How to detect when a full run isn't being done? You don't want to worry about dependencies if you are running a single script, as this would slow down everything. I think this is solved by checking for failures only, but who knows.
  • At least 10 other things I haven't thought of yet.

bitwes avatar Aug 26 '25 17:08 bitwes

I just saw that you're already using the test_collector. Nice!

bitwes avatar Aug 26 '25 18:08 bitwes

That's a treasure of a reply, thank you for putting in all this effort!

I decided to go for the closest to what i'm familiar with as i like to keep everything simple, though it turned out to be way more complicated than i thought with the way i'm doing it.

I went for the approach that defines a data structure like this in the pre-run hook:

var path_prefix := "res://tests/"
var path_suffix := ".gd"

var test_order := [
	# Globals:
	{name = "test_helpers", dependencies = []},
	# Components:
	{name = "test_health", dependencies = ["test_helpers"]},
	{name = "test_health_plus", dependencies = ["test_health", "test_helpers"]},
]

I already came up with a way to recursively know if a dependency is circular in the Health class, but i would like to only solve that problem here if i see that, just crashing with no message saying why, would confuse me and others. And it's a bit different than how i already do circular detection because i have recursive lists that point to a class inside a class and check if an object is the same as the first one, which means i looped to where i started from.


So now, the hard part is to make the sorting function.

I have the pseudocode logic for it and the drawings on paper, but there are problems that slow me down. I intentionally want to solve this myself without the help of the web or AI because i see this as one of those beneficial things if i manage to do it on my own.

I did it on paper and noticed a problem which now i forgot, but then i couldn't transfer it into code because it wants too many nested loops and that deters me. However i'll come back to try and share what i have done if i solve this, just so people and i can have the solution ready when it's needed and maybe someone will see it and say "hey, i got a better idea".

tomsterBG avatar Aug 26 '25 19:08 tomsterBG

Please share what you come up with, or if you need anything. If you come up with a generic solution, this might be something that could be it's own plugin. Also, let me know what parts of GUT you end up using, I can make adjustments on my end to get you data easier if need be or just make sure that they are less likely to be changed. Good luck.

bitwes avatar Aug 26 '25 19:08 bitwes

I managed to do a specific solution to my problem and i don't think it's abstract enough because it doesn't take care of everything. Also i'm sure there is an edge case i missed because the rules i used were very simple.

This is the file: https://github.com/tomsterBG/godot-classes-unit-tested/blob/main/tests/script_order_hook.gd

I shortened the data structure to be more implicit simply because the redundant code was too much:

var test_scripts: Dictionary[StringName, Array] = {
	# name = [dependencies],
	# globals:
	helpers = [],
	# components:
	health = [&"helpers"],
	health_plus = [&"health", &"helpers"],
}

However i still prefer explicit writing when it's feasible in a line or two. Here are the two methods that do the work. One creates an ordered array of string names and the other checks if everything worked out.

The logic is very simple. If there aren't dependencies, go to the front. If there are dependencies, go after the latest found dependency.

func get_ordered_tests() -> Array[StringName]:
	var ordered_tests: Array[StringName]
	
	for test: StringName in test_scripts.keys():
		var dependencies := test_scripts[test]
		if dependencies.size() == 0:
			ordered_tests.push_front(test)
			continue
		
		var idx_of_last_dependency: int = -1
		for dependency in dependencies:
			if ordered_tests.find(dependency) > idx_of_last_dependency:
				idx_of_last_dependency = ordered_tests.find(dependency)
		ordered_tests.insert(idx_of_last_dependency+1, test)
	
	print("ordered_tests = ", ordered_tests)
	return ordered_tests

func are_tests_ordered_correctly(ordered_tests: Array[StringName]) -> bool:
	var tests := ordered_tests.duplicate(IS_DEEP)
	tests.reverse()
	
	for idx in range(tests.size()):
		var test_name: StringName = tests[idx]
		var dependencies: Array = test_scripts[test_name]
		for dependency in dependencies:
			var dependency_idx: int = tests.find(dependency)
			if dependency_idx < idx:
				return false
	return true

One question. I started using prints because the gut logger didn't work in the pre-run hook. Is it supposed to do that?

tomsterBG avatar Aug 27 '25 15:08 tomsterBG

Nice!

One question. I started using prints because the gut logger didn't work in the pre-run hook. Is it supposed to do that?

This appears to be a bug. If you run it from the command line the output appears, but it doesn't seem to appear in the editor anywhere. I couldn't find an obvious cause of it. I made #748 for it.

A little constructive criticism.

Maintaining the test_scripts dictionary overtime is going to be a nightmare. Especially if there is more than one person working on the game at a time. It will likely generate frequent merge conflicts. It's a great starting point, but I would encourage you to find a way to populate it dynamically. I stopped myself from making an example of getting the dependencies from the scripts themselves using constants. This seems like a fun problem that I could lose a lot of time to. If you'd like an example though, I'll make one, it'll be fun.

Another thing to consider is that it might be a little early to implement this. You should know that GUT wouldn't exist if I hadn't got sidetracked on creating something I thought I needed right now. So I fully understand and support anything you're doing...but...GUT currently has 223 test scripts with 1,520 tests and 2,539 asserts and I haven't found a strong need for this. A couple times I've had a ton of failing tests that I had to look through to find the root cause, but not a lot.

Lastly, I took a look at your test scripts. One of the rewarding parts of making a tool is seeing how others use it. Your use of Time.get_ticks_usec reminded me of issue #677 and might be the encouragement I needed increase its priority.

The TDD purist in me says your tests have too many asserts. All your asserts have messages though, so breaking them up doesn't get you much. That's exactly how I do it when I don't do it the way I think I should do it.

I liked your default value test. I do that a lot and have seen arguments against it because they tend to change a lot. My thought is that a different default value can have far reaching consequences and tests feel like an easier way to find the root cause...even if the test itself has to change frequently. It's one of the places where I like a lot of asserts in a single test.

Ok, I've typed enough. Keep me posted on your progress and let me know if I can do anything to help.

bitwes avatar Aug 27 '25 19:08 bitwes

Thank you for addressing the bug and for the criticism! I usually really love it when people tell me how i can improve things.

Since i'm new to Git related workflows it really is eye-opening to mention that a single file containing everything will cause common merge conflicts. I already had some of those without even having other contributors. This dependency system is still in its early stages so i'm probably going to figure out a way to improve it one day. At least it gives a warning!

I was right in the development process of the timing system for my regeneration which now works much better. You have addressed one of my stages while i was making it. Now i simulate the regen without awaiting at all, and i have forgotten to freeze the process in the test so regeneration from _process mixes with simulated regeneration, but it seems to work so i'm not complaining. :)

It is very agreeable that the test dependency system isn't the most necessary thing of all because while making it i realized: "Hmm, if an error appears in low-level and high-level, i should check out the low-level first. This is confirmed by the Factorio blog post because the specific example the dev gave was again a hard to find, low-level bug." This is basically the same as ordering the tests in a less dependent order first, but it's just a manual thing one has to do and maybe at the scale of Factorio, they found out that manual searching for every failure is just not going to cut it. You need to remember how dependent each system is and all that, so familiarity plays a role. They also have something like a custom engine to take care of after all. I think sorting the errors really helps remove some of the required familiarity out of the equation.

Have noticed that focusing on too many asserts is sometimes wasting time, though i want to try some things out. Mainly to assume as few things as possible, and to do the "red, green, refactor". For some reason i feel much better when a change in my code causes it to go bad and then i fix it, simply because there's a message to tell me what happened. I'm not really a fan of silent failures.

For example i assume that my default value of Health is 100 and i do damage and healing tests, but i can't do those if i don't first ensure the default value is 100. If it isn't and my other tests fail, i may misunderstand why the tests failed because nothing tells me i have made an assumption that doesn't work anymore. And the specific default values used here are pretty standard so i felt confident about them.

tomsterBG avatar Aug 28 '25 08:08 tomsterBG

Git is the worst. It's the best, but, it is the worst. That being said, you would end up with conflicts regardless of the version control system you use. Files that change frequently are bound to cause conflicts. You're working on one feature, I'm working on another and we both need to add dependencies to the dictionary. I reorder 2 of them and add 1. You add 2, but to the one I reordered. Since we both edited the same lines it has no way to know what the final version should look like. Conflict ensues.

Per your asserts, I should have stated that I like more tests with fewer asserts. You're probably asserting the right amount, but your tests are somewhat big by TDD standards. Some of your longer tests with a lot of asserts should be broken down into multiple tests. That's the TDD way. It doesn't have to be your way. Test how you like. Especially when you have messages on each assert. Nothing worse than a test with 4 asserts and one fails but you don't know which one it was.

when a change in my code causes it to go bad

This is the good stuff. Unexpected failing tests usually means you found a bug early.

bitwes avatar Aug 28 '25 17:08 bitwes