godot
godot copied to clipboard
Improve merging of docs on generation
Cleans up and improves (slightly, it seems) performance in merging docs by using sorted lookup
- Closes: https://github.com/godotengine/godot-proposals/issues/9125
Ran some benchmarking with the following:
diff --git a/main/main.cpp b/main/main.cpp
index 4c9c159f47..f87e02eb44 100644
--- a/main/main.cpp
+++ b/main/main.cpp
@@ -3257,10 +3257,18 @@ bool Main::start() {
GLOBAL_DEF(PropertyInfo(Variant::INT, "dotnet/project/assembly_reload_attempts", PROPERTY_HINT_RANGE, "1,16,1,or_greater"), 3);
#endif
+ OS::get_singleton()->benchmark_begin_measure("Documentation", "Total");
+
+ OS::get_singleton()->benchmark_begin_measure("Documentation", "Generation");
+
Error err;
DocTools doc;
doc.generate(gen_flags);
+ OS::get_singleton()->benchmark_end_measure("Documentation", "Generation");
+
+ OS::get_singleton()->benchmark_begin_measure("Documentation", "Loading");
+
DocTools docsrc;
HashMap<String, String> doc_data_classes;
HashSet<String> checked_paths;
@@ -3299,24 +3307,44 @@ bool Main::start() {
ERR_FAIL_COND_V_MSG(err != OK, false, "Error loading classes from: " + index_path + ": " + itos(err));
checked_paths.insert(index_path);
+ OS::get_singleton()->benchmark_end_measure("Documentation", "Loading");
+
+ OS::get_singleton()->benchmark_begin_measure("Documentation", "Merging");
+
print_line("Merging docs...");
doc.merge_from(docsrc);
+ OS::get_singleton()->benchmark_end_measure("Documentation", "Merging");
+
+ OS::get_singleton()->benchmark_begin_measure("Documentation", "Erasing");
+
for (const String &E : checked_paths) {
print_line("Erasing old docs at: " + E);
err = DocTools::erase_classes(E);
ERR_FAIL_COND_V_MSG(err != OK, false, "Error erasing old docs at: " + E + ": " + itos(err));
}
+ OS::get_singleton()->benchmark_end_measure("Documentation", "Erasing");
+
+ OS::get_singleton()->benchmark_begin_measure("Documentation", "Saving");
+
print_line("Generating new docs...");
err = doc.save_classes(index_path, doc_data_classes);
ERR_FAIL_COND_V_MSG(err != OK, false, "Error saving new docs:" + itos(err));
+ OS::get_singleton()->benchmark_end_measure("Documentation", "Saving");
+
+ OS::get_singleton()->benchmark_begin_measure("Documentation", "Deleting");
+
print_line("Deleting docs cache...");
if (FileAccess::exists(EditorHelp::get_cache_full_path())) {
DirAccess::remove_file_or_error(EditorHelp::get_cache_full_path());
}
+ OS::get_singleton()->benchmark_end_measure("Documentation", "Deleting");
+
+ OS::get_singleton()->benchmark_end_measure("Documentation", "Total");
+
OS::get_singleton()->set_exit_code(EXIT_SUCCESS);
return false;
}
And for me it gave the following two cases: OUTDATED
[Documentation] (Without these changes)
- Generation: 632.900 msec.
- Loading: 397.195 msec.
- Merging: 28.075 msec.
- Erasing: 154.838 msec.
- Saving: 723.653 msec.
- Deleting: 0.707 msec.
- Total: 1937.385 msec.
[Documentation] (With just the initial changes)
- Generation: 647.350 msec.
- Loading: 408.476 msec.
- Merging: 17.443 msec.
- Erasing: 160.213 msec.
- Saving: 734.933 msec.
- Deleting: 0.988 msec.
- Total: 1969.419 msec.
[Documentation] (Without the constructor changes)
- Generation: 624.207 msec.
- Loading: 389.684 msec.
- Merging: 17.958 msec.
- Erasing: 136.054 msec.
- Saving: 728.034 msec.
- Deleting: 0.951 msec.
- Total: 1896.905 msec.
[Documentation] (Without the constant sort)
- Generation: 641.539 msec.
- Loading: 406.578 msec.
- Merging: 17.521 msec.
- Erasing: 195.100 msec.
- Saving: 711.884 msec.
- Deleting: 0.765 msec.
- Total: 1973.406 msec.
[Documentation] (With all changes)
- Generation: 629.614 msec.
- Loading: 394.829 msec.
- Merging: 10.491 msec.
- Erasing: 139.492 msec.
- Saving: 636.447 msec.
- Deleting: 0.834 msec.
- Total: 1811.721 msec.
Which is marginal but is present, but the generation isn't slowed down by the additional sorting, so I'd say this is an improvement in any case just cleaner code
The constructor and operator part is minimal in the total performance, so can restore that code if desired, but I'd say it is useful in either case
Great work! You love to see PRs with more lines removed than added that also speed performance :) I'll go through and review your changes in more detail when I get the chance.
Since the merging logic is non-trivial now, I might suggesting making a helper function or iterator that you can reuse for the different merge cases. For example, a merge_vectors
method that takes a function pointer, which you might call like merge_vectors(to_properties, from_properties, &merge_property)
, where merge_property
is a function that just handles merging two matched properties.
If you feel that overcomplicates things, I think repeating the iteration code is fine as well.
I'd say first group them into their own functions (as per above comment). Then, the unification would come natural at a later point, as it would be easier to envision and refactor with "all of the cards on the table".
The comparison isn't the same in the different cases, so it'd be more complicated
Also realized I need to fix some details of this, one moment, the operators and constructors need their own comparison, will fix that
Edit: There, cleaned up and created dedicated methods for each case, realized I compared constructors and operators just on name which was incorrect, now it works properly for merging
Edit 2: Need to fix some details, realized that the doc files aren't guaranteed to be sorted, so need to handle that
The constants will have to be removed from this as they are not sorted in the docs, will fix that
Fixed some of the code, getting late so will continue working tomorrow, should work now, but with some new changes, will elaborate on them tomorrow
Will update the benchmarks tomorrow
Wow, the new benchmarks show a bigger improvement and it seems like most of it is due to the saving
somehow getting faster. It doesn't look like any of your code directly affects saving, unless somehow ensuring the docs are sorted makes saving faster. Maybe you your disk drive was less busy during the later tests or something?
There's going to be fluctuations especially with file access, will do some more rigorous benchmarking tomorrow, and some more tweaks to the ordering of constructors, it needs a little more fixing but it's almost ready again
Writing down so I don't forget it, will test not sorting the loaded docs tomorrow and instead searching in the generated docs which is already sorted, instead of doing a sort + merge, will see
So preliminary update (will work later today), will drop the operators and constructors for now, they're not difficult to fix as such but they require some discussion on how to handle the ordering etc. that I think it best left to a follow up PR, and they make up a vanishingly small portion of the documentation data (only built-in types have them, and only a few each)
Will push later today and do new benchmarking
Some different changes, will elaborate more later, but here are some basic benchmarking for comparison, the difference isn't major, but it's something, and I think the general code improvements are good in themselves as well
[Unchanged]
- Generation: 663.865 msec. 660.288 msec. 642.372 msec. Avrg: 655.508 msec.
- Loading: 414.215 msec. 417.114 msec. 396.416 msec. Avrg: 409.248 msec.
- Merging: 28.389 msec. 28.027 msec. 28.552 msec. Avrg: 28.323 msec.
- Erasing: 147.778 msec. 144.207 msec. 136.605 msec. Avrg: 142.863 msec.
- Saving: 678.956 msec. 699.290 msec. 681.661 msec. Avrg: 686.636 msec.
- Deleting: 1.001 msec. 1.016 msec. 0.738 msec. Avrg: 0.918 msec.
- Total: 1934.221 msec. 1949.956 msec. 1886.360 msec. Avrg: 1923.512 msec.
[With initial]
- Generation: 632.953 msec. 647.692 msec. 645.188 msec. Avrg: 641.944 msec.
- Loading: 394.408 msec. 407.362 msec. 401.538 msec. Avrg: 401.103 msec.
- Merging: 28.209 msec. 27.638 msec. 27.739 msec. Avrg: 27.862 msec.
- Erasing: 122.336 msec. 142.152 msec. 135.156 msec. Avrg: 133.215 msec.
- Saving: 682.808 msec. 702.175 msec. 707.425 msec. Avrg: 697.469 msec.
- Deleting: 0.762 msec. 0.848 msec. 0.775 msec. Avrg: 0.795 msec.
- Total: 1861.495 msec. 1927.882 msec. 1917.837 msec. Avrg: 1902.405 msec.
[With constructor/operator cleanup]
- Generation: 618.375 msec. 639.833 msec. 645.607 msec. Avrg: 634.605 msec.
- Loading: 385.870 msec. 388.729 msec. 397.186 msec. Avrg: 390.595 msec.
- Merging: 15.727 msec. 15.715 msec. 15.666 msec. Avrg: 15.703 msec.
- Erasing: 141.252 msec. 134.104 msec. 138.175 msec. Avrg: 137.844 msec.
- Saving: 705.065 msec. 690.901 msec. 683.012 msec. Avrg: 692.993 msec.
- Deleting: 0.757 msec. 0.723 msec. 1.083 msec. Avrg: 0.854 msec.
- Total: 1867.062 msec. 1870.022 msec. 1880.743 msec. Avrg: 1872.609 msec.
[Without Constants]
- Generation: 647.417 msec. 652.177 msec. 648.856 msec. Avrg: 649.483 msec.
- Loading: 404.471 msec. 397.803 msec. 398.155 msec. Avrg: 400.143 msec.
- Merging: 16.087 msec. 16.225 msec. 16.278 msec. Avrg: 16.197 msec.
- Erasing: 143.796 msec. 148.085 msec. 140.245 msec. Avrg: 144.042 msec.
- Saving: 686.624 msec. 692.332 msec. 708.543 msec. Avrg: 695.833 msec.
- Deleting: 1.060 msec. 1.100 msec. 1.157 msec. Avrg: 1.106 msec.
- Total: 1899.472 msec. 1907.738 msec. 1913.250 msec. Avrg: 1906.820 msec.
[All changes]
- Generation: 626.262 msec. 649.933 msec. 654.115 msec. Avrg: 643.436 msec.
- Loading: 393.736 msec. 406.013 msec. 402.916 msec. Avrg: 400.888 msec.
- Merging: 13.201 msec. 12.842 msec. 13.006 msec. Avrg: 13.016 msec.
- Erasing: 140.544 msec. 143.606 msec. 146.294 msec. Avrg: 143.481 msec.
- Saving: 728.345 msec. 693.062 msec. 693.980 msec. Avrg: 705.128 msec.
- Deleting: 0.759 msec. 0.876 msec. 1.025 msec. Avrg: 0.887 msec.
- Total: 1902.863 msec. 1906.348 msec. 1911.352 msec. Avrg: 1906.845 msec.
Would be interesting to see some other benchmarking as it's likely specific
Will do some further testing, but the current method uses a lookup on one side of the data, rather than a merge, to find valid cases, this avoids sorting the loaded data unnecessarily
The performance improvements might be greater in the total case with sorting still, as the loaded docs is expected to be largely sorted, but there's tradeoffs on either side, will look further later and see what method is the best in totality, but going with this method at the moment
So some observations and details:
- Constructors are a bit weird (and possibly broken, for a future PR to work out), they only require that all the types match, in any order meaning that you can't have
Color(float, Color)
andColor(Color, float)
as separate constructors with separate descriptions - Operators and constructors are vanishingly few, so no need to process them (at least not for now)
- The results for merging or searching (the current solution) seem to be negligible in testing, so keeping this method, sorting the loaded values should be very cheap, as they are essentially sorted in most cases, but the savings on using a merge is likely still not significant enough compared with sorting, so both methods are essentially equivalent
In the end the improvements for performance seem negligible, though it might be more on some hardware, hard to say, but I think the code style improvements are significant here so IMO this change isn't without merit
Some changes I did:
- Cleaned up the constructor and operator code, it was a bit inefficient, some of the changes:
- Checking argument count before checking names match (kept the check for names in constructors, it's essentially pointless but someone might just add an incorrectly named constructor in the doc file and break things, all the generated constructors will always have the same name)
- Removed the checks in operators for argument matching using the permutation method, as operators (currently) have at most one argument
- Avoiding mutation checks and locking for the various vectors by loading the data beforehand
- Fixed index types after upgrade to 64 bits (not going to matter, unless you have impossibly big data, but still good)
- Made some other data improvements and changed the check for plain methods to just check for name order, to avoid accidentally invoking some other ordering, this was also changed for the sorting methods in the generation for strictly non-operator, non-constructor methods, just to avoid unnecessary checks in the sorting
Ready for review now, might need some minor tinkering, but works well now, tested messing with documentation files and it doesn't break when you do things like reordering or deleting entries, the checks are all sufficient
Discovered there was a sneaky sort in the save operation, removing this redundancy and adding sorting elsewhere instead
Will apply the style changes in a bit, rebasing my help search
Does Godot have a type to represent sorted arrays? It might be nice so it's easier to keep track of assumptions about whether or not an array has already been sorted or not.
If it doesn't, would it be worth putting in some assert statements to ensure the array is sorted when it is expected to be, at least in debug builds?
There isn't a dedicated type for that no, and we'd want to not copy memory around so it wouldn't really be helpful IMO, we are operating on the actual doc data which uses Vector
, that'd be a bit more specific programming stuff than Godot really uses or needs
Realized it doesn't help to sort the loaded data for merge with the general purpose sort algorithm we use, it being sorted or mostly sorted doesn't change anything most likely, haven't studied the algorithm but it's likely a general purpose one with best/worst/average all being N log N
, would be interesting to look at special purpose sorting algorithms in the future to help with these situations though, but not now
Will go over this one a final time this week and squash
It's been a few weeks since this was made, what's the current status and TL;DR on actual improvements?
IIUC the performance gain may not be that significantly, but is the code quality significantly better? If so, that should be fine.
Are the commits meant to be squashed?
I'd say that without a more dynamic sorting algorithm (which we might want to add in the future) the performance improvements are likely marginal, but I think the code quality is greatly improved, will go back over it and re familiarise myself with what this does specifically as it's been quite a while since I touched it
There are some minor improvements like swapping some orders of comparisons for faster paths which is nice
The performance remains very much unchanged, it only improves the merge step (about halves the time) but it is so minor that it doesn't matter, but the code is now easier to read in my opinion, future improvements could help with some of these parts I'm sure
One question: Should I just add the benchmarks I used to the code? With possible modifications?
Should I just add the benchmarks I used to the code? With possible modifications?
Your call, if you expect they'll be useful for further work in the future, they could be added behind a local define.
I'll add them if I end up working more on it in the future, for now I think this part isn't something that'll be critical to get good benchmarks on
Thanks!
Thank you!