hashtables icon indicating copy to clipboard operation
hashtables copied to clipboard

Add support for checking the size of the table

Open nikita-volkov opened this issue 11 years ago • 10 comments

nikita-volkov avatar Nov 20 '13 11:11 nikita-volkov

This ticket's description is... lacking. I assume it means the following problem:

If you call newSized 187787582217851179 (e.g. a QuickCheck test or programming error) a useful exception should be thrown. Currently you get *** Failed! Exception: 'Data.Vector.Mutable: uninitialised element'. In other words, newSized should check its input parameter.

nominolo avatar Mar 31 '14 15:03 nominolo

@nominolo I think the underlying vector code should be responsible for throwing an exception or otherwise indicating OOM in this case.

gregorycollins avatar Mar 31 '14 15:03 gregorycollins

@nominolo I meant counting the amount of elements in table.

nikita-volkov avatar Mar 31 '14 17:03 nikita-volkov

In general the reason there's no size function for the typeclass is that it's O(n) for all but one of them. That said, I should probably add a size function to BasicHashTable, since O(1) size is possible there.

gregorycollins avatar Mar 31 '14 21:03 gregorycollins

What about adding an extra STRef Int field to the other implementations, which will keep the current size? To this day I had to make wrapper APIs just because of that.

nikita-volkov avatar Apr 12 '14 14:04 nikita-volkov

Because hashtables are about speed, and if you don't need to do this bookkeeping (and clearly it's not necessary at least in the case of cuckoo hash), the program can go faster. As you've noted, the wrapper API is trivial.

One thing we could do is include the wrapper into the package -- but doing it right is a matter of design. We do have this information for the "basic" (linear probing) table, just not for the others -- so we wouldn't want to do extra work there. Associated type synonyms might work there.

gregorycollins avatar Apr 13 '14 10:04 gregorycollins

I've implemented a dome library over "hashtables", which approaches this issue amongst others. In the coming days I'll be test-driving it in another project. If no major issues arise, I'll release it right after. I'll post back here with updates.

nikita-volkov avatar Apr 13 '14 13:04 nikita-volkov

I've just released the library I was talking about: http://hackage.haskell.org/package/hashtables-plus

nikita-volkov avatar Apr 16 '14 22:04 nikita-volkov

As you've noted, the wrapper API is trivial.

The "trivial" API would be inefficient since it would involving doing a lookup for each insert to check whether we should increment the counter by 0 or 1. It does need to be added specifically to the implementation. Alternatively we add a single class method maybeMutate (Maybe k -> Maybe v) where the second function takes the result of lookup and then re-implement insert/delete/lookup in terms of maybeMutate. Then and only then could we efficienlty and generically implement a wrapper.

Actually nvmd, it seems the existing mutateST :: (Eq k, Hashable k) => h s k v -> k -> (Maybe v -> ST s (Maybe v, a)) -> ST s a should do what I need. But then a generic wrapper needs to also overwrite mutate/insert/delete/lookup to pass it through mutateST.

infinity0 avatar Dec 14 '18 23:12 infinity0

I second this. It will be useful to have O(1) size function, just like what vector, ByteString etc. provide. For now, I can do O(n) using toList but that also likely creates temporaries to serialize the contents. Keeping track of size is very important in production applications. Too bad that hashtables-plus was deprecated by @nikita-volkov.

sanketr avatar Jun 11 '21 15:06 sanketr