hashmap.c icon indicating copy to clipboard operation
hashmap.c copied to clipboard

How to implement hashmap_filter

Open Adamcake opened this issue 1 year ago • 7 comments

Thanks for making this software, it's very useful.

Is it safe to call hashmap_delete while using hashmap_iter (or hashmap_scan)? Would this ever result in invalid pointers, or in some items being skipped by the iterator?

If it's not safe, how would you suggest implementing a "filter" function? Could it be added as a feature?

Adamcake avatar Sep 05 '24 15:09 Adamcake

Would this ever result in invalid pointers, or in some items being skipped by the iterator?

Yes: https://github.com/tidwall/hashmap.c/blob/master/hashmap.c#L459-L461

If it's not safe, how would you suggest implementing a "filter" function? Could it be added as a feature?

If by "filter" you mean extracting a subset of a hashmap, you could iterate the hashmap and build another hashmap (or any other data structure) with the items matching your criteria instead of altering the source hashmap.

scossu avatar Sep 05 '24 17:09 scossu

Here I am using the counter reset method. This is less efficient because it restarts the iteration every time after a delete, until the whole iteration runs without matches. But it's an option.

scossu avatar Sep 05 '24 17:09 scossu

By "filter" I mean deleting every item that matches some condition. I don't need to extract them.

Resetting the loop each time might be the best method for now (and yours is a nice tidy way of doing it compared to what I would've written.) It would be nice to have this as a feature though.

Adamcake avatar Sep 05 '24 17:09 Adamcake

The problem with deleting during iteration is that when the current item is deleted, that deletion operation may cause the items in the buckets to the right to move over by one. This would mean that iterator index may be off by one, thus skipping the next item. It won't always be the case, but it will happen often enough to disappoint.

One idea is having a specialized delete function that is only intended for iteration. Perhaps.... it could take the iteration index as a parameter instead of a key. Using that index it could look at the last item provided to the user, delete it and then move the index appropriately.

size_t iter = 0;
void *item;
while (hashmap_iter(map, &iter, &item)) {
    const struct user *user = item;
    if (!user->activated) {
        hashmap_iter_delete(map, &iter);
    }
}

tidwall avatar Sep 06 '24 01:09 tidwall

Yeah, that'd work for me. The only other option I could think of would be if, internally, you could do multiple deletions at once without moving anything, them do all the moving-around at the end. That would potentially be more efficient as you'd never need to move any item more than once.

Adamcake avatar Sep 06 '24 14:09 Adamcake

Have you considered the option of building a data structure additively from matching items instead of deleting items from the original? To me it seems more straightforward and less error-prone, but I don't know your use case.

The only case in which it wouldn't work is if you have a very large hashmap that you can't keep in memory along with its (partial) copy.

scossu avatar Sep 06 '24 14:09 scossu

That would incur at least one or two new heap allocations, which doesn't seem necessary here. Your loop-resetting solution would be more efficient than that.

Adamcake avatar Sep 06 '24 14:09 Adamcake