hashtables
hashtables copied to clipboard
Add support for checking the size of the table
This ticket's description is... lacking. I assume it means the following problem:
If you call newSized 187787582217851179
(e.g. a QuickCheck test or programming error) a useful exception should be thrown. Currently you get *** Failed! Exception: 'Data.Vector.Mutable: uninitialised element'
. In other words, newSized
should check its input parameter.
@nominolo I think the underlying vector code should be responsible for throwing an exception or otherwise indicating OOM in this case.
@nominolo I meant counting the amount of elements in table.
In general the reason there's no size function for the typeclass is that it's O(n) for all but one of them. That said, I should probably add a size function to BasicHashTable, since O(1) size is possible there.
What about adding an extra STRef Int
field to the other implementations, which will keep the current size? To this day I had to make wrapper APIs just because of that.
Because hashtables are about speed, and if you don't need to do this bookkeeping (and clearly it's not necessary at least in the case of cuckoo hash), the program can go faster. As you've noted, the wrapper API is trivial.
One thing we could do is include the wrapper into the package -- but doing it right is a matter of design. We do have this information for the "basic" (linear probing) table, just not for the others -- so we wouldn't want to do extra work there. Associated type synonyms might work there.
I've implemented a dome library over "hashtables", which approaches this issue amongst others. In the coming days I'll be test-driving it in another project. If no major issues arise, I'll release it right after. I'll post back here with updates.
I've just released the library I was talking about: http://hackage.haskell.org/package/hashtables-plus
As you've noted, the wrapper API is trivial.
The "trivial" API would be inefficient since it would involving doing a lookup
for each insert
to check whether we should increment the counter by 0 or 1. It does need to be added specifically to the implementation. Alternatively we add a single class method maybeMutate (Maybe k -> Maybe v)
where the second function takes the result of lookup
and then re-implement insert/delete/lookup
in terms of maybeMutate
. Then and only then could we efficienlty and generically implement a wrapper.
Actually nvmd, it seems the existing mutateST :: (Eq k, Hashable k) => h s k v -> k -> (Maybe v -> ST s (Maybe v, a)) -> ST s a
should do what I need. But then a generic wrapper needs to also overwrite mutate/insert/delete/lookup
to pass it through mutateST
.
I second this. It will be useful to have O(1)
size function, just like what vector
, ByteString
etc. provide. For now, I can do O(n)
using toList
but that also likely creates temporaries to serialize the contents. Keeping track of size
is very important in production applications. Too bad that hashtables-plus
was deprecated by @nikita-volkov.