c-hashmap icon indicating copy to clipboard operation
c-hashmap copied to clipboard

add a function "hashmap_full"

Open u-an-i opened this issue 1 year ago • 7 comments

check whether hashmap is about to be resized, useful in multithreaded code

u-an-i avatar Oct 03 '23 03:10 u-an-i

I think this is a good idea, but there are a few changes needed before I will consider merging.

I haven't messed with this code in a while, so I could be missing something, but why are you doing m->count + 2 instead of m->count + 1? Also, since the same condition (m->count + 1 > HASHMAP_MAX_LOAD * m->capacity) is repeated several times in map.c, and you wrote your function definition right above them in the same source file, why not replace the identical conditions with calls to hashmap_full(), which would definitely be inlined in those cases? I think that could help avoid bugs in the future and also make the code a bit cleaner.

I would also appreciate it if you made the function definition match the style of the rest of the code by putting the opening curly bracket on the line below.

If you address all of these issues, I will probably merge your pr. Thanks for taking the time to work on this project!

Mashpoe avatar Oct 03 '23 04:10 Mashpoe

hash value updates when count+1 is full trigger the resize, I outlined this in the header file. So + 2 allows to be able to still update existing values.

I modify the formatting!

u-an-i avatar Oct 03 '23 04:10 u-an-i

Thank you for updating the formatting! I still don't think the +2 should be there. If you look at hashmap_set(), the capacity check (m->count + 1 > HASHMAP_MAX_LOAD * m->capacity) is done before m->count is incremented. The +1 is included in the condition to account for this, which already accomplishes what your +2 is supposed to do.

In other words, the +1 is there to check if the hashmap will exceed the maximum load for its capacity after another element is added. When you add 2 instead of 1, you are checking if the hashmap will exceed the max load after 2 additional elements are added, instead of just 1.

If you agree with what I'm saying, I would also appreciate it if you replaced the identical conditions with calls to your new function.

Mashpoe avatar Oct 04 '23 05:10 Mashpoe

After reading your comment, I think I understand now. I didn't really understand the wording at first. You seem to want hashmap_full() to return true even if there is room for one more element to be added before a resize, and I don't understand why this would be a good idea. If this were the case, the function would still return true after you added the next element, so there would be no way for you to know if there is room to add another element or not. Why not just return the same value when there is room for any number of additional elements? With the function in its current state, if it returns true, you have no way of knowing if there is room for one more element or zero more elements.

Assuming this is a good idea, I think you should rename the function. If the function is called "hashmap_full", I think it should only return true if the hashmap is actually full. You could also rename the function and make it return an integer value for how many elements could be added before a resize occurs.

Mashpoe avatar Oct 04 '23 06:10 Mashpoe

The issue I was trying to solve is if the next "new addition" to the hashmap does a resize I wanted to do something in my multithreaded code (basically wait so that 1 thread does not look up a value while the current one reallocates memory -- this was a real issue where my program crashed: entry in find_entry became invalid) BUT I wanted to allow to still be able to "update a value" (whose key is) already in the hashmap.

The way "set" works is the check is done at the beginning: even when entry->key != NULL, so also no ++m->count, an "update" triggers a resize.

This is what I wanted to prevent, I wanted to be able to update but be warned when a resize is "about" to occur. Hence the +2.
The check in set at the beginning could be moved into the "if == null" if find_entry does no for(;;) but a "while(<count){} return null;" but I didn't want to dive in deeper without making myself aware of the whole picture.

I agree all the +1 checks can be moved to a function (so supporting compiler to optimise for size or allowing them inlining for speed) and returning the amount left is a good idea.

Let me know your thoughts again and I would

  • move +1s to a function
  • return amount left.

u-an-i avatar Oct 04 '23 06:10 u-an-i

I went ahead and added a hashmap_sets_left_before_resize function replacing hashmap_full

u-an-i avatar Oct 04 '23 14:10 u-an-i

@Mashpoe i implemented the changes as suggested by you in the current PR

u-an-i avatar Oct 22 '23 13:10 u-an-i