Replace `List` with `LocalVector` if used as simple buffer
A byproduct when I go through List. These list are used as simple buffer to store value temporarily, which means:
- They are local variables.
- They get value by
push_back. - Values are consumed by iterating the
List.
So I change them to LocalVector, as they do not depend on container implementation.
There are other kinds of usages which can be replaced, but need some prepare first:
- Same as above, but iterate from tail to head. Need a reverse iterator for
LocalVector. - Used as stack. Need a
pop_back()method forLocalVector. - Used as queue. Shall we implement a dedicated
Queue?
This needs some checking to ensure these are safe uses, the swap is only valid if the container size is not changed while iterating, ~the code in Node for example looks unsafe~
Also important to evaluate is the tradeoff between locality and reduced overhead in memory on one hand, and the cost of reallocation and moving of data on the other
One of the big benefits of List is that changing the size reallocates no memory, so it would depend on the data type stored and the cost of repeatedly reallocating data depending on how the size of the container changes and the overhead there
This needs some checking to ensure these are safe uses, the swap is only valid if the container size is not changed while iterating, the code in
Nodefor example looks unsafe
My bad, that's a queue. Double checked, no other resize during iterating.
One of the big benefits of List is that changing the size reallocates no memory, so it would depend on the data type stored and the cost of repeatedly reallocating data depending on how the size of the container changes and the overhead there
All these containers gather value by push_back, for List it is an alloc per call. I think alloc times matters more than alloc size.
If performance turns out to be problem, we can call reserve().
We should be using reserve everywhere that we can. I.e. for places where we know the total number of elements, reserve exactly enough for the total number of elements, and where we only know a possible range, we should allocate enough to cover the range.
We should be using reserve everywhere that we can. I.e. for places where we know the total number of elements, reserve exactly enough for the total number of elements, and where we only know a possible range, we should allocate enough to cover the range.
That should be another PR I guess, we should check all LocalVector usage, not just these. Will implement that if have time. Or you just prefer refactor these first?
I imagine lawnjelly's FixedArray would be beneficial in some cases, if the size is known:
https://github.com/godotengine/godot/blob/71bbe3b00c40986859045e0397f6a3afe311c3e2/core/templates/fixed_array.h
It allocates on the stack, avoiding a heap allocation. It would be useful if VectorView accepted it as it would mean there are numerous cases where it could be used to create things like uniform sets without allocating on the heap.
Just see #99328 and I think I have to clarify something with this performance tag.
The first priority of this PR is not to improve performance (If these List cause problem we should have found them long ago).
My major concern here is List is generally not considered to play well with modern processors (cache stuff). If you use it, you probably have some special needs (e.g. insert in the middle). If not, like in this PR, the use of List may confuse someone when they touch the code.
So this should be considered a codestyle stuff first, and I did some test to show it is not going to cause systematic regression.
MicroBenchmark
push_back one by one from a std::vector and iterating through to do some nonsense stuff. Also test if you sort before iterate because many code do this.
| relative | ns/op | op/s | err% | total | benchmark
|---------:|--------------------:|--------------------:|--------:|----------:|:----------
| 100.0% | 121.56 | 8,226,243.55 | 0.1% | 0.01 | `List itemsize: 4 length: 1 sort: 0`
| 207.7% | 58.53 | 17,085,847.77 | 0.1% | 0.01 | `LocalVector itemsize: 4 length: 1 sort: 0`
| 100.0% | 484.70 | 2,063,120.98 | 0.2% | 0.01 | `List itemsize: 4 length: 7 sort: 0`
| 195.7% | 247.67 | 4,037,581.08 | 0.1% | 0.01 | `LocalVector itemsize: 4 length: 7 sort: 0`
| 100.0% | 3,865.53 | 258,696.92 | 0.2% | 0.05 | `List itemsize: 4 length: 63 sort: 0`
| 812.5% | 475.78 | 2,101,827.68 | 0.1% | 0.01 | `LocalVector itemsize: 4 length: 63 sort: 0`
| 100.0% | 34,327.83 | 29,130.88 | 1.1% | 0.41 | `List itemsize: 4 length: 511 sort: 0`
| 3,671.7% | 934.93 | 1,069,603.20 | 0.7% | 0.01 | `LocalVector itemsize: 4 length: 511 sort: 0`
| 100.0% | 254,799.90 | 3,924.65 | 0.2% | 3.05 | `List itemsize: 4 length: 4095 sort: 0`
| 8,035.9% | 3,170.79 | 315,378.95 | 1.0% | 0.04 | `LocalVector itemsize: 4 length: 4095 sort: 0`
| 100.0% | 128.63 | 7,774,033.51 | 0.2% | 0.01 | `List itemsize: 24 length: 1 sort: 0`
| 174.8% | 73.58 | 13,590,218.24 | 0.0% | 0.01 | `LocalVector itemsize: 24 length: 1 sort: 0`
| 100.0% | 508.46 | 1,966,727.22 | 0.2% | 0.01 | `List itemsize: 24 length: 7 sort: 0`
| 163.0% | 311.90 | 3,206,153.31 | 0.1% | 0.01 | `LocalVector itemsize: 24 length: 7 sort: 0`
| 100.0% | 4,053.12 | 246,723.21 | 0.3% | 0.05 | `List itemsize: 24 length: 63 sort: 0`
| 368.0% | 1,101.36 | 907,966.79 | 0.2% | 0.01 | `LocalVector itemsize: 24 length: 63 sort: 0`
| 100.0% | 32,916.59 | 30,379.81 | 0.2% | 0.39 | `List itemsize: 24 length: 511 sort: 0`
| 543.4% | 6,057.23 | 165,092.06 | 1.3% | 0.07 | `LocalVector itemsize: 24 length: 511 sort: 0`
| 100.0% | 284,490.54 | 3,515.06 | 0.2% | 3.40 | `List itemsize: 24 length: 4095 sort: 0`
| 680.3% | 41,816.05 | 23,914.26 | 0.5% | 0.50 | `LocalVector itemsize: 24 length: 4095 sort: 0`
| 100.0% | 121.97 | 8,198,987.54 | 0.1% | 0.01 | `List itemsize: 4 length: 1 sort: 1`
| 197.9% | 61.64 | 16,223,083.55 | 0.1% | 0.01 | `LocalVector itemsize: 4 length: 1 sort: 1`
| 100.0% | 561.59 | 1,780,657.10 | 0.6% | 0.01 | `List itemsize: 4 length: 7 sort: 1`
| 219.0% | 256.46 | 3,899,191.63 | 0.1% | 0.01 | `LocalVector itemsize: 4 length: 7 sort: 1`
| 100.0% | 4,212.65 | 237,380.50 | 0.4% | 0.05 | `List itemsize: 4 length: 63 sort: 1`
| 678.8% | 620.59 | 1,611,370.31 | 0.2% | 0.01 | `LocalVector itemsize: 4 length: 63 sort: 1`
| 100.0% | 38,429.21 | 26,021.87 | 0.2% | 0.46 | `List itemsize: 4 length: 511 sort: 1`
| 1,569.3% | 2,448.83 | 408,357.83 | 0.5% | 0.03 | `LocalVector itemsize: 4 length: 511 sort: 1`
| 100.0% | 309,527.59 | 3,230.73 | 0.2% | 3.70 | `List itemsize: 4 length: 4095 sort: 1`
| 1,573.5% | 19,671.17 | 50,835.82 | 0.3% | 0.23 | `LocalVector itemsize: 4 length: 4095 sort: 1`
| 100.0% | 128.69 | 7,770,760.97 | 1.3% | 0.01 | `List itemsize: 24 length: 1 sort: 1`
| 154.5% | 83.30 | 12,004,334.72 | 4.2% | 0.01 | `LocalVector itemsize: 24 length: 1 sort: 1`
| 100.0% | 738.00 | 1,355,013.55 | 1.8% | 0.01 | `List itemsize: 24 length: 7 sort: 1`
| 138.9% | 531.46 | 1,881,610.87 | 0.2% | 0.01 | `LocalVector itemsize: 24 length: 7 sort: 1`
| 100.0% | 6,569.49 | 152,218.86 | 0.1% | 0.08 | `List itemsize: 24 length: 63 sort: 1`
| 150.1% | 4,377.90 | 228,420.02 | 0.2% | 0.05 | `LocalVector itemsize: 24 length: 63 sort: 1`
| 100.0% | 69,331.06 | 14,423.55 | 0.2% | 0.83 | `List itemsize: 24 length: 511 sort: 1`
| 132.0% | 52,521.50 | 19,039.82 | 0.2% | 0.63 | `LocalVector itemsize: 24 length: 511 sort: 1`
| 100.0% | 724,062.43 | 1,381.10 | 0.1% | 8.65 | `List itemsize: 24 length: 4095 sort: 1`
| 123.5% | 586,161.22 | 1,706.02 | 0.1% | 7.01 | `LocalVector itemsize: 24 length: 4095 sort: 1`
Source Code
Just try to run MyTest() in any godot environment. I hack test_main.cpp, thanks @Ivorforce for the idea.
Using this nice small lib nanobench.h
#define ANKERL_NANOBENCH_IMPLEMENT
#include "nanobench.h"
#include "core/templates/list.h"
#include "core/templates/local_vector.h"
#include <ostream>
#include <sstream>
#include <vector>
template<typename T>
class Test
{
T placehold;
List<T> list;
LocalVector<T> vector;
public:
void bench_list(const std::vector<T> &source, bool sort)
{
for (auto &item : source)
{
list.push_back(item);
}
if (sort)
{
list.sort();
}
for (auto &item : list)
{
placehold = item;
}
}
void bench_local_vector(const std::vector<T> &source, bool sort)
{
for (auto &item : source)
{
vector.push_back(item);
}
if (sort)
{
vector.sort();
}
for (auto &item : vector)
{
placehold = item;
}
}
static void bench(bool sort)
{
for (int i = 1; i < 4096 + 1; i *= 8)
{
int n = i == 1 ? 1 : i-1;
std::ostringstream name;
name << "itemsize: " << sizeof(T) << " length: " << n << " sort: " << sort;
std::vector<T> source(n);
auto bench = ankerl::nanobench::Bench()
.minEpochIterations(1000)
.relative(true);
bench.run("List " + name.str(), [&]{
Test test;
test.bench_list(source, sort);
});
bench.run("LocalVector " + name.str(), [&]{
Test test;
test.bench_local_vector(source, sort);
});
}
}
};
void mytest(void)
{
Test<int>::bench(false);
Test<Variant>::bench(false);
Test<int>::bench(true);
Test<Variant>::bench(true);
}
Add reserve to be aligned with #100251.
That being said — this is a pretty big change, and I gotta say if I were the code maintainer I would be uncomfortable to merge it because of potential regressions. I would therefore advocate to split the PR and address each use separately, to make them small, comfortable and somewhat tested chunks. It's annoying, but it's safe.
Yes, but it is already split from a bigger one :) Despite being somewhat scattered, these changes are essentially the same pattern.
Safety should probably be ok, I double checked they follow the restriction on top, but I cannot prove that to you (unless you read the code).
I'm waiting to see if #100251 can be merged efficiently. If that can not, I'd better give up.
Any feedback on this? I feel like I've had too many pending PRs and I'd like to clean up some. Can I do something to get it merged?
Any feedback on this? I feel like I've had too many pending PRs and I'd like to clean up some. Can I do something to get it merged?
As I mentioned before, this PR (as some of your others) has quite a few changes, and there is no proof of correctness or superiority over the previous implementation. Both factors combined likely lead to unwillingness to spend time doing reviews or generally risk the code churn of a merge.
Personally I fully support and appreciate your efforts for the List to LocalVector transition, but you should probably start with smaller steps that you can fully prove to be correct and measure a performance benefit for :)
Ok, I see @AThousandShips react with thumbs up so I assume maintainers are not generally against this idea. I'll try to continue this work.
there is no proof of correctness or superiority over the previous implementation.
What kind of proof are you expecting? I definitely cannot formally verify these code. 😅 If the CI can't convince you, then I'm afraid there's not much I can do to self-prove. About superiority, I think LocalVector beats List in both API simplicity and performance (as long as you don't insert). If we haven't reach a consensus on this point, I should create a proposal for further discussion.
Not generally against it, just agreeing with some of the opinions
What kind of proof are you expecting? I definitely cannot formally verify these code. 😅
Here's what I would do:
- Find an example of a function (ideally an important one) that is practically suffering from its use of
List - Benchmark calls of this function, showing that replacing the List with LocalVector leads to measurable benefits (even among its other performance costs)
- Analyze surrounding code and describe why I think the change will never be slower or somehow incorrect. This should be easy with a single function, but difficult with tens of functions like here.
This should be enough to get a merge. After the merge, you can add a new PR addressing a few similar functions, testing one of them explicitly and describing why the rest of them are all analogous. Given you've extensively described and proven the general benefits at this point, it's more likely this can be approved. Perhaps after that, you can start an even larger PR with less closely related functions because the benefit has been even better established. Depending on your actual benchmark results of course :)
showing that replacing the List with LocalVector leads to measurable benefits
Well I think that's a pure misunderstanding caused by that performance tag. These changes are present here only because they don't cause a performance problem. I propose this change because I think LocalVector is more "correct" than List in these place and I'd like to do a thorough cleanup of these usecases. This inevitably involves risks and won’t bring many immediate benefits so I'm checking maintainer‘s willingness.
I think
LocalVectoris more "correct" thanListin these place
I'm not sure I follow, can you explain why List is incorrect? So far youve only talked performance considerations. I'm confused you don't consider this to be a performance PR.
We have extensive use of List, with only a small portion becoming performance bottlenecks. For those that do become bottlenecks, we can handle them through the usual performance optimization process. However, for those that don’t, I believe they pose two potential issues:
- They might force nearby hotspot code to use List, leading to performance issues.
- They could put pressure on the memory allocator (e.g., more fragmentation, reduced locality), indirectly affecting other parts of the system.
Since these concerns remain hypothetical, I think labeling them as performance PRs are inappropriate. The direct impact of these improvements might be only a few microseconds or even less.
Translated by GPT :)
You are making some interesting and valid points. If this is the primary concern behind your PRs, i recommend you gather evidence and / or articles for this to link, and open a Godot proposal. Let people weigh in, and if a consensus is reached that this is important, a big clean up PR can be made to address it once and for all. If you're just making these PRs with no consensus they should be merged, you're in a dead end. Theoretical concerns are well and good, but evidence is needed to actually warrant the risks and effort merging PRs brings.
I’m going to close this PR as I’m currently evaluating the overall changes from List to LocalVector. Once I’ve gathered enough information, I’ll recreate this one with a better approach and scope.