hyperloglog.rs icon indicating copy to clipboard operation
hyperloglog.rs copied to clipboard

Example code leads to serde error and deprecation warnings:

Open rbtcollins opened this issue 3 years ago • 4 comments

from https://docs.rs/crate/hyperloglogplus/latest

use std::collections::hash_map::RandomState;
use hyperloglogplus::{HyperLogLog, HyperLogLogPlus};

let mut hllp = HyperLogLogPlus::new(16, RandomState::new()).unwrap();

hllp.add(&12345);
hllp.add(&23456);

assert_eq!(hllp.count().trunc() as u32, 2);

But try to serialise that:

error[E0277]: the trait bound `RandomState: serde::ser::Serialize` is not satisfied
    --> main.rs:40:31
     |
40   |         serde_json::to_string(&hllp).unwrap()
     |         --------------------- ^^^^^^^^^^^^^ the trait `serde::ser::Serialize` is not implemented for `RandomState`
     |         |
     |         required by a bound introduced by this call
     |
     = note: required because of the requirements on the impl of `serde::ser::Serialize` for `HyperLogLogPlus<std::string::String, RandomState>`
note: required by a bound in `serde_json::to_string`

rbtcollins avatar May 30 '22 15:05 rbtcollins

Hi, thanks for the comment. You're right about the deprecation warnings I pushed a fix to the README but didn't publish a new crate version.

Regarding the serialization error you have to use a BuildHasher implementation that implements serde::{Serialize, Deserialize}, I have a test for that here.

tabac avatar Jun 04 '22 18:06 tabac

So the challenge for a user - particularly one where being able to pass HLL structs around (e.g. do a map-reduce count on PB scale data sets) - is to go from the requirement you have to a functioning implementation.

Have you considered perhaps instead using https://crates.io/crates/fxhash and having a much simpler interface, where Serialize/Deserialize are fully implemented when the serde feature is enabled?

rbtcollins avatar Jun 09 '22 12:06 rbtcollins

The BuildHasher is a generic parameter because I didn't want to restrict users to a specific hashing function. If you have a serializable BuildHasher you can wrap the sketch in a struct of your own and pass instances around.

Something like this should work:

#[derive(Serialize, Deserialize)
struct Hllp(Hyperloglogplus<u32, SerializableBuildHasher>);

tabac avatar Jun 13 '22 21:06 tabac

Oh sure, I'm not suggesting that you can't have a generic parameter there.

I'm suggesting perhaps 3 related things.

  1. Whatever the generic parameter is, having a batteries included version that can be used for both the 'write HLL in process 1, serialise, transport, deserialise, continue using in process 2' use case, and the 'write HLLs in many processes, transport to a muxer process and merge them together there' use case, would be a massive advantage from a usability perspective. Whether thats batteries included as a struct in the codebase, or example code in the docs doesn't matter to me.

  2. BuildHasher is perhaps a poor interface because the std library only defines BuildHasher's from RandomState (not serialisable, though it could be), and Default Hashable types.

Making a marker interface when HLL is built with Serde support, e.g. B: BuilderHasher + Serializable + Deserializable or maybe both HLL and HLLSerializable with the latter having that variable of buildhasher parameter would improve the situation with 2, by moving the constraint closer to the point that matters - the HLL definition, not the place serialisation is attempted.

For 1, providing a example, e.g. one built around ahash, would be very useful. For 'extremely useful', providing one compatible with the redis HLL format would be omg useful.

rbtcollins avatar Jun 24 '22 18:06 rbtcollins

As with https://github.com/tabac/hyperloglog.rs/issues/16 I don't have much time for improvements, :(

tabac avatar Aug 26 '24 20:08 tabac