godot icon indicating copy to clipboard operation
godot copied to clipboard

[Core] Optimize `HashMap`

Open Nazarwadim opened this issue 1 year ago • 34 comments
trafficstars

Benchmarks

Compile command: scons -j11 target=template_release scu_build=true

HashMap<int, int>

Elements Time find before (nsec) Time find after (nsec)
10 685 453
50 672 707
250 2542 2330
1250 13065 8843
6250 55934 43017
31250 373167 293621
156250 1905111 1500588
781250 14192621 11117365
3906250 82247888 66698464
19531250 441888032 364882432
Elements Time insert before (usec) Time insert after (usec)
5 2 0
25 1 0
125 14 4
625 34 18
3125 162 95
15625 808 723
78125 3953 3477
390625 39874 28357
1953125 188333 154200
9765625 970730 758132

Notes. These benchmarks were conducted using the reserve method, as there were different map expansion points due to the replacement from fastmod to &. There were 5 times more elements reserved than were inserted. In the find benchmark, 50% of the elements are not in the table itself, which gives 50% of the misses. Also it looks for all of these elements. Therefore, the increase in time is proportional to the increased number of elements.

Reducing the size of the compiled binary. If you compare the binary generated by CI with mine and with the master, you can see that the binary is about 0.1 - 1 MB smaller. Most likely, this is due to the _FORCE_INLINE_ that I added for some methods.

Replacing fastmod with & I completely replaced fastmod with &.
As I checked, & is 5 times faster than fastmod and 10 times faster than %. Also, keep in mind that fastmod is not supported on 32-bit architectures.

How & works? We can instantly calculate 141%100, where we simply discard everything but the hundredth. The remainder of the division will be 41. The same is true for 1267 % 10 = 7 or 4 % 10000 = 4. For computers, this works in binary, as I did.

The necessary condition is that the number from which we want to take the remainder of the division must be 2^n. Let's represent some number in binary format 00100. To find the remainder from, say, a hash of 11110, we need to subtract 1 from our number and then do a bitwise AND. 11110 & 00011 = 00010. And that's it.

How & was implemented? In order to reduce the number of operations in the parts of the code that are most often used, I made capacity one less than used in other containers. But the get_capacity method will return the normal capacity. Next, we need to make sure that the condition for our number 2^n - 1 is met. In the _resize_and_rehash method, I use this

capacity = next_power_of_2(capacity) - 1;

to satisfy this condition. Before that, It used a loop to fit the number from the hash table array (in reserve method).

Removing unnecessary hash calculations insert method previously had 2 _hash calculations and the operator [] had 3 in total. I removed the hash calculation from the _insert method and renamed this method to _insert_with_hash. I renamed the other method, which was _insert_with_hash, to _insert_with_hash_and_element. I also added the _look_up_with_hash method, which searches for a position with a previously calculated hash.

Improving hash quality for string Using hash_fmix32 for calculated hash for strings greatly reduced the number of collisions. This can be seen by comparing the results of this code. GDScript implementation:

var map = {}
var start = Time.get_ticks_usec()
for i in range(10000):
	map[str(i)] = 4
for i in range(10000):
	var a = map[str(i)]
print(Time.get_ticks_usec() - start)

C++ implementation:

HashMap<String, long> map;
uint64_t start = OS::get_singleton()->get_ticks_usec();
for (int i = 0; i < 10000; i++) {
	map[itos(i)] = 4;
}
uint64_t end = OS::get_singleton()->get_ticks_usec();
print_line(end - start);

There is a 30% performance improvement in gdscript and a 213% improvement in C++ If the implementation of the hashmap differs only by the hasher.

Improving Comparator for floating point numbers Using a bitwise comparison instead of a normal one and checking for NAN significantly reduces the comparison time.

Nazarwadim avatar Mar 31 '24 19:03 Nazarwadim

Now it looks like this Testing 100000 rand() insertions image Average before - after: 1779.9 Average % decrease: 16.4159557297671 %

Edit: Testing *getptr() was incorrect, because after testing the insertion, I immediately deleted the table.

Testing 100000 rand() *getptr(). image Average before - after: 162.6 Average % decrease: 6.2203519510329 %

Nazarwadim avatar Apr 01 '24 21:04 Nazarwadim

What is measured in "before" and "after"?

MewPurPur avatar Apr 02 '24 09:04 MewPurPur

What is measured in "before" and "after"?

Master branch - before. My branch - after

Nazarwadim avatar Apr 02 '24 09:04 Nazarwadim

No what is measured, what are the units

AThousandShips avatar Apr 02 '24 09:04 AThousandShips

No what is measured, what are the units

u_sec

Nazarwadim avatar Apr 02 '24 10:04 Nazarwadim

benchmark code I used:

HashMap1<int, int> before;
HashMap<int, int> after;
int found1;
int found2;
uint32_t count = 100000;
uint32_t find_count = 100000;
void test_hash_map() {
	uint64_t start1 = OS::get_singleton()->get_ticks_usec();
	for (uint32_t i = 0; i < count; i++) {
		before[rand()] = rand();
	}
	uint64_t end1 = OS::get_singleton()->get_ticks_usec();
	uint64_t start2 = OS::get_singleton()->get_ticks_usec();
	for (uint32_t i = 0; i < count; i++) {
		after[rand()] = rand();
	}
	uint64_t end2 = OS::get_singleton()->get_ticks_usec();
	print_line("Time inserting before:", end1 - start1);
	print_line("Time inserting after:", end2 - start2);

	uint64_t start3 = OS::get_singleton()->get_ticks_usec();

	for (uint32_t i = 0; i < find_count; i++) {
		auto e = before.getptr(rand());
		if (e) {
			found1 = *e;
		}
	}

	uint64_t end3 = OS::get_singleton()->get_ticks_usec();
	before.clear();
	uint64_t start4 = OS::get_singleton()->get_ticks_usec();

	for (uint32_t i = 0; i < find_count; i++) {
		auto e = after.getptr(rand());
		if (e) {
			found2 = *e;
		}
	}
	uint64_t end4 = OS::get_singleton()->get_ticks_usec();
	after.clear();
	print_line("Time find before:", end3 - start3);
	print_line("Time find after:", end4 - start4);
}

Nazarwadim avatar Apr 02 '24 10:04 Nazarwadim

benchmark code: (...)

Each before/after case should be runned for the same dataset, otherwise the before/after execution times might differ not only because of the changes made in this PR but also because of e.g. different hash collisions for the input datasets.

Like in the example above before/after should be runned with the same RNG seed.

kleonc avatar Apr 02 '24 13:04 kleonc

Like in the example above before/after should be runned with the same RNG seed.

I don't think it would have made much of a difference, but I'm going to try it now. Each test has its own random with a seed of 0.

        RandomPCG random1;
	random1.seed(0);
	RandomPCG random2;
	random2.seed(0);
	RandomPCG random3;
	random1.seed(0);
	RandomPCG random4;
	random2.seed(0);

image

image

Nazarwadim avatar Apr 02 '24 15:04 Nazarwadim

getptr getting worse twice, by a non-insignificant amount, is worrying, to me it implies something about these improvements are broken, there's no reason for it to become slower

Try restoring fastmod and see what changes, I have a feeling that is the cause, I don't think it can be simply removed

I don't even see how the replacement code is the same, do I don't think that's a valid change

AThousandShips avatar Apr 02 '24 17:04 AThousandShips

Try restoring fastmod and see what changes, I have a feeling that is the cause, I don't think it can be simply removed

fastmod and my version work exactly the same. I did benchmarks and my version is 2 times faster. Also, it doesn't degrade by half, but only by about 5%. As you can see, after I started using RandomPCG, the insertion has degraded by half and the search by 3. I think the best thing to do would be to try to create some random array and then insert its elements into the table and see the results. And then go through the array looking for those elements from the table.

Nazarwadim avatar Apr 02 '24 17:04 Nazarwadim

Also, it doesn't degrade by half, but only by about 5%

I know? I didn't say it did? I said it was a reduction, and it doesnt make sense if the input is exactly the same

So I'd say this either means you're not comparing fairly, or the changes aren't trivial

Also I'd say getptr should be done with a reasonable range of value, just using a plain random will make it miss almost always, so please show your updated testing code, and I think it needs to be changed or it won't give usable results

AThousandShips avatar Apr 02 '24 17:04 AThousandShips

I don't even see how the replacement code is the same, do I don't think that's a valid change

For example

capacity = 10;
hash = 15;
pos = hash % capacity; // first fastmod now  pos = 5 

//then in the loop we do something like this

pos = (pos + 1) % capacity // 6 % 10 = 6
// after a few iterations pos = 9;
pos = (pos + 1) % capacity // 10 % 10 = 0

Nazarwadim avatar Apr 02 '24 17:04 Nazarwadim

I'd say the getptr tests should use rand() % (2 * count) or even based on the actual size, or you'll have just a 3125:134217728 chance of a hit (or 0.0023%), so it's essentially all misses, which, granted, gives a reference for those, but the hit case is arguably much more important as it varies much more

AThousandShips avatar Apr 02 '24 17:04 AThousandShips

HashMap1<int, int> before;
HashMap<int, int> after;

Vector<int> rand_vec;
int found1;
int found2;
uint32_t count = 1000000;

void insert_random_variables_in_vec()
{
	for(int i = 0; i < 2 * count; i++) // should be 2 * count to benchmark misses
	{
		rand_vec.push_back(rand());
	}
}

void test_hash_map() {
	for (int j = 0; j < 10; j++) {
		before.clear();
		uint64_t start1 = OS::get_singleton()->get_ticks_usec();
		for (uint32_t i = 0; i < count; i++) {
			before[rand_vec[i]] = rand_vec[i] * rand_vec[i];
		}
		uint64_t end1 = OS::get_singleton()->get_ticks_usec();

		print_line("Time inserting before:", end1 - start1);
	}

	for (int j = 0; j < 10; j++) {
		after.clear();
		uint64_t start2 = OS::get_singleton()->get_ticks_usec();
		for (uint32_t i = 0; i < count; i++) {
			after[rand_vec[i]] = rand_vec[i] * rand_vec[i];
		}
		uint64_t end2 = OS::get_singleton()->get_ticks_usec();
		print_line("Time inserting after:", end2 - start2);
	}

	print_line(" ");

	for (int j = 0; j < 10; j++) {
		uint64_t start3 = OS::get_singleton()->get_ticks_usec();

		for (uint32_t i = 0; i < rand_vec.size(); i++) {
			auto e = before.getptr(rand_vec[i]);
			if (e) {
				found1 = *e;
			}
		}
		uint64_t end3 = OS::get_singleton()->get_ticks_usec();
		print_line("Time find before:", end3 - start3);
	}

	for (int j = 0; j < 10; j++) {
		uint64_t start4 = OS::get_singleton()->get_ticks_usec();

		for (uint32_t i = 0; i < rand_vec.size(); i++) {
			auto e = after.getptr(rand_vec[i]);
			if (e) {
				found2 = *e;
			}
		}
		uint64_t end4 = OS::get_singleton()->get_ticks_usec();
		print_line("Time find after:", end4 - start4);
	}

}

Console output:

Time inserting before: 98877
Time inserting before: 140672
Time inserting before: 137699
Time inserting before: 136475
Time inserting before: 136530
Time inserting before: 142960
Time inserting before: 136449
Time inserting before: 136262
Time inserting before: 137182
Time inserting before: 136673
Time inserting after: 87340
Time inserting after: 130727
Time inserting after: 130751
Time inserting after: 130086
Time inserting after: 129933
Time inserting after: 130118
Time inserting after: 131152
Time inserting after: 129826
Time inserting after: 129644
Time inserting after: 130364
 
Time find before: 66767
Time find before: 64331
Time find before: 74998
Time find before: 63937
Time find before: 64128
Time find before: 64302
Time find before: 65063
Time find before: 64686
Time find before: 64487
Time find before: 65016
Time find after: 64541
Time find after: 63817
Time find after: 63936
Time find after: 64024
Time find after: 64464
Time find after: 64537
Time find after: 64476
Time find after: 64392
Time find after: 64001
Time find after: 64075

Nazarwadim avatar Apr 02 '24 18:04 Nazarwadim

You're still looking up at an infinitesimal chance of hitting

No, I iterate over the array whose elements I have already inserted.

if (e) { 
	found1 = *e;
	counter++;
}
.......................
print_line(counter);

Output : 10004690

Nazarwadim avatar Apr 02 '24 18:04 Nazarwadim

A graph would be good to make it manageable :) but looks much better

AThousandShips avatar Apr 02 '24 18:04 AThousandShips

image image image image

Nazarwadim avatar Apr 02 '24 18:04 Nazarwadim

That looks much more reasonable performance wise

AThousandShips avatar Apr 02 '24 18:04 AThousandShips

  1. Now we're optimizing, wouldn't (un)likely() help in some checks? I'm specifically talking about the various exists/!exists.

~~It makes no difference.~~.

  1. Regarding fastmod():

pos + 1 < capacity ? (pos + 1) : 0; and fastmod are not the same. fastmod = %. My variant is to make pos go around in a circle. Something like this 1 2 3 4 5 6 7 8 9 0 1 2 3 .......

Nazarwadim avatar Apr 03 '24 17:04 Nazarwadim

I found that it is possible to completely remove fastmod in the _get_probe_length method

Nazarwadim avatar Apr 03 '24 18:04 Nazarwadim

@Geometror I think you implemented fastmod, maybe you can provide feedback?

MewPurPur avatar Apr 03 '24 18:04 MewPurPur

Everything is fine with fastmod.

Nazarwadim avatar Apr 03 '24 18:04 Nazarwadim

Alright, checking this again and to be honest I'm not sure about completely replacing fastmod with &, since that requires having a capacity of 2^n. That might work for synthetic benchmarks, but generally there is a good reason for using prime numbers for the capacity (as many other hash map implementations do as well): to prevent clustering and achieve a better distribution, thus preventing a high number of collisions (that can even be a problem with very good hash functions). As far as I know you're only testing with completely random data, but that's not really representative of the real-world usage. For example, (in case I remember correctly) TileSet/TileMap makes extensive use of HashMap, using integer coordinates as keys. The other optimizations you did however look fine to me :)

Geometror avatar Apr 30 '24 15:04 Geometror

The problem about & seems to exist only when the hash does not differ on the lower bits. hash_fmix32 seems to solve this problem. JDK uses 2^n and has a similar function. https://hg.openjdk.org/jdk7/jdk7/jdk/file/9b8c96f96a0f/src/share/classes/java/util/HashMap.java#l264

Nazarwadim avatar Apr 30 '24 17:04 Nazarwadim

I made the same HashMap using fastmod. This is a table with different filling algorithms. The key is int. Time msec. The benchmarks were the same, but I removed resize by 5 and gave only the total time these benchmarks took(insert/find).

function to fill fastmod(msec) &(msec)
pow(i, i) 150 109
pow(i, 2) 122 100
pow(2, i) 101 83
i 2321 1980
2 * i 2340 1995
n - i 2291 1966
rand() 2282 1971

Also It doesn't matter if i is iterated from 0 to 1000000 or from 1000000000 1001000000. And the same if i++ or i+=10.

Nazarwadim avatar Apr 30 '24 19:04 Nazarwadim

The problem here is that we use many different hashing strategies depending on the data type (see HashMapHasherDefault code below). hash_fmix32 is not perfect and might fail in certain cases. Having a prime number as the capacity is just another measure to reduce the probability of clustering in case a hash function fails/performs poorly for a given dataset, at a small performance cost. Now we need to decide whether that's worth it or not. I'd be interested in how this compares with other key data types, primarily String and Vector2/3/4i. Optimizing a hash map in this way is a very delicate undertaking, since usually when performance improves in certain cases the worst-case performance drops in others you aren't even aware of. (there was a PR which might be helpful for testing: https://github.com/godotengine/godot/pull/72561)

Could you commit your benchmarks as well? So that we can test this on different platforms?

// Generic hash function for any type.
	template <typename T>
	static _FORCE_INLINE_ uint32_t hash(const T *p_pointer) { return hash_one_uint64((uint64_t)p_pointer); }

	template <typename T>
	static _FORCE_INLINE_ uint32_t hash(const Ref<T> &p_ref) { return hash_one_uint64((uint64_t)p_ref.operator->()); }

	static _FORCE_INLINE_ uint32_t hash(const String &p_string) { return p_string.hash(); }
	static _FORCE_INLINE_ uint32_t hash(const char *p_cstr) { return hash_djb2(p_cstr); }
	static _FORCE_INLINE_ uint32_t hash(const wchar_t p_wchar) { return hash_fmix32(p_wchar); }
	static _FORCE_INLINE_ uint32_t hash(const char16_t p_uchar) { return hash_fmix32(p_uchar); }
	static _FORCE_INLINE_ uint32_t hash(const char32_t p_uchar) { return hash_fmix32(p_uchar); }
	static _FORCE_INLINE_ uint32_t hash(const RID &p_rid) { return hash_one_uint64(p_rid.get_id()); }
	static _FORCE_INLINE_ uint32_t hash(const CharString &p_char_string) { return hash_djb2(p_char_string.get_data()); }
	static _FORCE_INLINE_ uint32_t hash(const StringName &p_string_name) { return p_string_name.hash(); }
	static _FORCE_INLINE_ uint32_t hash(const NodePath &p_path) { return p_path.hash(); }
	static _FORCE_INLINE_ uint32_t hash(const ObjectID &p_id) { return hash_one_uint64(p_id); }

	static _FORCE_INLINE_ uint32_t hash(const uint64_t p_int) { return hash_one_uint64(p_int); }
	static _FORCE_INLINE_ uint32_t hash(const int64_t p_int) { return hash_one_uint64(p_int); }
	static _FORCE_INLINE_ uint32_t hash(const float p_float) { return hash_murmur3_one_float(p_float); }
	static _FORCE_INLINE_ uint32_t hash(const double p_double) { return hash_murmur3_one_double(p_double); }
	static _FORCE_INLINE_ uint32_t hash(const uint32_t p_int) { return hash_fmix32(p_int); }
	static _FORCE_INLINE_ uint32_t hash(const int32_t p_int) { return hash_fmix32(p_int); }
	static _FORCE_INLINE_ uint32_t hash(const uint16_t p_int) { return hash_fmix32(p_int); }
	static _FORCE_INLINE_ uint32_t hash(const int16_t p_int) { return hash_fmix32(p_int); }
	static _FORCE_INLINE_ uint32_t hash(const uint8_t p_int) { return hash_fmix32(p_int); }
	static _FORCE_INLINE_ uint32_t hash(const int8_t p_int) { return hash_fmix32(p_int); }
	static _FORCE_INLINE_ uint32_t hash(const Vector2i &p_vec) {
		uint32_t h = hash_murmur3_one_32(p_vec.x);
		h = hash_murmur3_one_32(p_vec.y, h);
		return hash_fmix32(h);
	}
	static _FORCE_INLINE_ uint32_t hash(const Vector3i &p_vec) {
		uint32_t h = hash_murmur3_one_32(p_vec.x);
		h = hash_murmur3_one_32(p_vec.y, h);
		h = hash_murmur3_one_32(p_vec.z, h);
		return hash_fmix32(h);
	}
	static _FORCE_INLINE_ uint32_t hash(const Vector4i &p_vec) {
		uint32_t h = hash_murmur3_one_32(p_vec.x);
		h = hash_murmur3_one_32(p_vec.y, h);
		h = hash_murmur3_one_32(p_vec.z, h);
		h = hash_murmur3_one_32(p_vec.w, h);
		return hash_fmix32(h);
	}
	static _FORCE_INLINE_ uint32_t hash(const Vector2 &p_vec) {
		uint32_t h = hash_murmur3_one_real(p_vec.x);
		h = hash_murmur3_one_real(p_vec.y, h);
		return hash_fmix32(h);
	}
	static _FORCE_INLINE_ uint32_t hash(const Vector3 &p_vec) {
		uint32_t h = hash_murmur3_one_real(p_vec.x);
		h = hash_murmur3_one_real(p_vec.y, h);
		h = hash_murmur3_one_real(p_vec.z, h);
		return hash_fmix32(h);
	}
	static _FORCE_INLINE_ uint32_t hash(const Vector4 &p_vec) {
		uint32_t h = hash_murmur3_one_real(p_vec.x);
		h = hash_murmur3_one_real(p_vec.y, h);
		h = hash_murmur3_one_real(p_vec.z, h);
		h = hash_murmur3_one_real(p_vec.w, h);
		return hash_fmix32(h);
	}
	static _FORCE_INLINE_ uint32_t hash(const Rect2i &p_rect) {
		uint32_t h = hash_murmur3_one_32(p_rect.position.x);
		h = hash_murmur3_one_32(p_rect.position.y, h);
		h = hash_murmur3_one_32(p_rect.size.x, h);
		h = hash_murmur3_one_32(p_rect.size.y, h);
		return hash_fmix32(h);
	}
	static _FORCE_INLINE_ uint32_t hash(const Rect2 &p_rect) {
		uint32_t h = hash_murmur3_one_real(p_rect.position.x);
		h = hash_murmur3_one_real(p_rect.position.y, h);
		h = hash_murmur3_one_real(p_rect.size.x, h);
		h = hash_murmur3_one_real(p_rect.size.y, h);
		return hash_fmix32(h);
	}
	static _FORCE_INLINE_ uint32_t hash(const AABB &p_aabb) {
		uint32_t h = hash_murmur3_one_real(p_aabb.position.x);
		h = hash_murmur3_one_real(p_aabb.position.y, h);
		h = hash_murmur3_one_real(p_aabb.position.z, h);
		h = hash_murmur3_one_real(p_aabb.size.x, h);
		h = hash_murmur3_one_real(p_aabb.size.y, h);
		h = hash_murmur3_one_real(p_aabb.size.z, h);
		return hash_fmix32(h);
	}

Geometror avatar May 01 '24 10:05 Geometror

there was a PR which might be helpful for testing: https://github.com/godotengine/godot/pull/72561

I used a similar code, but changed it a bit:

if (distance > 10) {
     WARN_PRINT("Excessive collision count (" + itos(distance) + "), is the right hash function being used? Class Name:" + __PRETTY_FUNCTION__);
}

Tested on godot-benchmarks. And first of all, I found this. This is in both hashmap implementations.

WARNING: Excessive collision count (82), is the right hash function being used? Class Name:void HashMap<TKey, TValue, Hasher, Comparator, Allocator>::_insert_with_hash(uint32_t, HashMapElement<TKey, TValue>*) [with TKey = StringName; TValue = Node*; Hasher = HashMapHasherDefault; Comparator = HashMapComparatorDefault<StringName>; Allocator = DefaultTypedAllocator<HashMapElement<StringName, Node*> >; uint32_t = unsigned int]

I found a place where there is this. https://github.com/godotengine/godot/blob/f91db3dc58f1d6a0cb63d591515183b5a45cd3ba/scene/main/node.h#L167 But in my implementation, there are much more collisions in this place. It even reaches 200. While when using fastmod, the is only 87. But still, I think the problem here is not in the map but in the StringName. I looked at the hashing method and saw the djb2 hashing method. After I simply added hashv = hash_fmix32(hashv); at the end, it fixed the problem. In both cases the number of collisions decreased by dozens. I don't know how to use murmur because it requires len and some hashing methods don't have it.

The rendering/hlod/cull_fast benchmark also generates a warning but only if fastmod is used:

WARNING: Excessive collision count (11), is the right hash function being used? Class Name:void HashMap<TKey, TValue, Hasher, Comparator, Allocator>::_insert_with_hash(uint32_t, HashMapElement<TKey, TValue>*) [with TKey = Callable; TValue = Object::SignalData::Slot; Hasher = HashableHasher<Callable>; Comparator = HashMapComparatorDefault<Callable>; Allocator = DefaultTypedAllocator<HashMapElement<Callable, Object::SignalData::Slot> >; uint32_t = unsigned int]

IMO, it's better not to adapt to rare cases, but adapt rare cases to us.

Could you commit your benchmarks as well? So that we can test this on different platforms?

Here https://github.com/Nazarwadim/godot-fork/tree/hash_map_benchmarks.

Nazarwadim avatar May 02 '24 09:05 Nazarwadim

Surely there are good references on good stress testing of hash containers out there to create realistic test cases with realistic distributions

AThousandShips avatar May 02 '24 13:05 AThousandShips

Also, though the CI will catch it if it happens, we need to make sure that any changes to the hash functions won't break the API checks, as they rely on hashes, which might break if the hashing of strings change, so if you find the linux editor CI complains massively about GDExtension compatibility you'll know what's broken and we'll have to investigate how to make that not break everything while potentially reducing collisions

AThousandShips avatar May 02 '24 13:05 AThousandShips

During my work today on benchmarks that I did in real conditions, I noticed that when using 2^n and hash_djb2 function there are fewer collisions, I don't know how much, but definitely fewer. Perhaps this explains such a large 10-20% difference in performance when & is used.

As for the strings, as I described above, it completely removed the collision warning. You can check in the godot-demo-projects project or use simple GDScript code.

	var map = {}
	var start = Time.get_ticks_usec()
	for i in range(10000):
		map[str(i)] = 4
	for i in range(10000):
		var a = map[str(i)]
	print(Time.get_ticks_usec() - start)

Here the difference in performance on my computer is 30%. And this despite the virtual machine.

make sure that any changes to the hash functions

I did not change hash functions, but only the wrapper that is given for the hashmap. I don't think it should break. But if it does break, I'll just make a wrapper over the hash function in the _hash method Like this:

	_FORCE_INLINE_ uint32_t _hash(const TKey &p_key) const {
		uint32_t hash = Hasher::hash(p_key);
		hash = hash_fmix32(hash); // Prevents collisions if poor hash function is provided.
		if (unlikely(hash == EMPTY_HASH)) {
			hash = EMPTY_HASH + 1;
		}

		return hash;
	}

Nazarwadim avatar May 02 '24 14:05 Nazarwadim