cardinality-estimator icon indicating copy to clipboard operation
cardinality-estimator copied to clipboard

Added mem-dbg across all structs of the crate as optional feature

Open LucaCappelletti94 opened this issue 1 year ago • 4 comments

Mem-dbg is a crate that allows to compute the size of a struct. I have added the derives through the crate as an optional feature, so as to use it to compare this implementation with others easily.

Cheers!

LucaCappelletti94 avatar Aug 08 '24 14:08 LucaCappelletti94

@LucaCappelletti94 thanks for adding optional mem-dbg feature.

We'd love to have this change in, however it seems few size calculation inconsistencies should be addressed first.

For example, if I apply the following diff:

diff --git a/src/estimator.rs b/src/estimator.rs
index 0c7c7fb..cca6bc0 100644
--- a/src/estimator.rs
+++ b/src/estimator.rs
@@ -181,7 +181,9 @@ where
     H: Hasher + Default,
 {
     fn fmt(&self, f: &mut Formatter<'_>) -> std::fmt::Result {
-        write!(f, "{:?}", self.representation())
+        use mem_dbg::{MemSize, SizeFlags};
+
+        write!(f, "{:?} mem_dbg = {:?}", self.representation(), self.mem_size(SizeFlags::default()))
     }
 }

and then run tests with:

cargo test --features mem_dbg test_estimator_p12_w6

I can see quite a few unexplained discrepancies between mem_dbg and the actual size:

---- estimator::tests::test_estimator_p12_w6::_0_expects_representation_small_estimate_0_size_8_avg_err_0_0000_ stdout ----
thread 'estimator::tests::test_estimator_p12_w6::_0_expects_representation_small_estimate_0_size_8_avg_err_0_0000_' panicked at src/estimator.rs:237:5:
assertion `left == right` failed
  left: "representation: Small(estimate: 0, size: 8), avg_err: 0.0000"
 right: "representation: Small(estimate: 0, size: 8) mem_dbg = 40, avg_err: 0.0000"
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

---- estimator::tests::test_estimator_p12_w6::_128_expects_representation_array_estimate_128_size_520_avg_err_0_0000_ stdout ----
thread 'estimator::tests::test_estimator_p12_w6::_128_expects_representation_array_estimate_128_size_520_avg_err_0_0000_' panicked at src/estimator.rs:237:5:
assertion `left == right` failed
  left: "representation: Array(estimate: 128, size: 520), avg_err: 0.0000"
 right: "representation: Array(estimate: 128, size: 520) mem_dbg = 40, avg_err: 0.0000"

---- estimator::tests::test_estimator_p12_w6::_1_expects_representation_small_estimate_1_size_8_avg_err_0_0000_ stdout ----
thread 'estimator::tests::test_estimator_p12_w6::_1_expects_representation_small_estimate_1_size_8_avg_err_0_0000_' panicked at src/estimator.rs:237:5:
assertion `left == right` failed
  left: "representation: Small(estimate: 1, size: 8), avg_err: 0.0000"
 right: "representation: Small(estimate: 1, size: 8) mem_dbg = 40, avg_err: 0.0000"

---- estimator::tests::test_estimator_p12_w6::_16_expects_representation_array_estimate_16_size_72_avg_err_0_0000_ stdout ----
thread 'estimator::tests::test_estimator_p12_w6::_16_expects_representation_array_estimate_16_size_72_avg_err_0_0000_' panicked at src/estimator.rs:237:5:
assertion `left == right` failed
  left: "representation: Array(estimate: 16, size: 72), avg_err: 0.0000"
 right: "representation: Array(estimate: 16, size: 72) mem_dbg = 40, avg_err: 0.0000"

---- estimator::tests::test_estimator_p12_w6::_1024_expects_representation_hll_estimate_1012_size_3092_avg_err_0_0130_ stdout ----
thread 'estimator::tests::test_estimator_p12_w6::_1024_expects_representation_hll_estimate_1012_size_3092_avg_err_0_0130_' panicked at src/estimator.rs:237:5:
assertion `left == right` failed
  left: "representation: Hll(estimate: 1012, size: 3092), avg_err: 0.0130"
 right: "representation: Hll(estimate: 1012, size: 3092) mem_dbg = 40, avg_err: 0.0130"

---- estimator::tests::test_estimator_p12_w6::_129_expects_representation_hll_estimate_130_size_3092_avg_err_0_0001_ stdout ----
thread 'estimator::tests::test_estimator_p12_w6::_129_expects_representation_hll_estimate_130_size_3092_avg_err_0_0001_' panicked at src/estimator.rs:237:5:
assertion `left == right` failed
  left: "representation: Hll(estimate: 130, size: 3092), avg_err: 0.0001"
 right: "representation: Hll(estimate: 130, size: 3092) mem_dbg = 40, avg_err: 0.0001"

---- estimator::tests::test_estimator_p12_w6::_256_expects_representation_hll_estimate_254_size_3092_avg_err_0_0029_ stdout ----
thread 'estimator::tests::test_estimator_p12_w6::_256_expects_representation_hll_estimate_254_size_3092_avg_err_0_0029_' panicked at src/estimator.rs:237:5:
assertion `left == right` failed
  left: "representation: Hll(estimate: 254, size: 3092), avg_err: 0.0029"
 right: "representation: Hll(estimate: 254, size: 3092) mem_dbg = 40, avg_err: 0.0029"

---- estimator::tests::test_estimator_p12_w6::_2_expects_representation_small_estimate_2_size_8_avg_err_0_0000_ stdout ----
thread 'estimator::tests::test_estimator_p12_w6::_2_expects_representation_small_estimate_2_size_8_avg_err_0_0000_' panicked at src/estimator.rs:237:5:
assertion `left == right` failed
  left: "representation: Small(estimate: 2, size: 8), avg_err: 0.0000"
 right: "representation: Small(estimate: 2, size: 8) mem_dbg = 40, avg_err: 0.0000"

---- estimator::tests::test_estimator_p12_w6::_3_expects_representation_array_estimate_3_size_24_avg_err_0_0000_ stdout ----
thread 'estimator::tests::test_estimator_p12_w6::_3_expects_representation_array_estimate_3_size_24_avg_err_0_0000_' panicked at src/estimator.rs:237:5:
assertion `left == right` failed
  left: "representation: Array(estimate: 3, size: 24), avg_err: 0.0000"
 right: "representation: Array(estimate: 3, size: 24) mem_dbg = 40, avg_err: 0.0000"

---- estimator::tests::test_estimator_p12_w6::_32_expects_representation_array_estimate_32_size_136_avg_err_0_0000_ stdout ----
thread 'estimator::tests::test_estimator_p12_w6::_32_expects_representation_array_estimate_32_size_136_avg_err_0_0000_' panicked at src/estimator.rs:237:5:
assertion `left == right` failed
  left: "representation: Array(estimate: 32, size: 136), avg_err: 0.0000"
 right: "representation: Array(estimate: 32, size: 136) mem_dbg = 40, avg_err: 0.0000"

---- estimator::tests::test_estimator_p12_w6::_4_expects_representation_array_estimate_4_size_24_avg_err_0_0000_ stdout ----
thread 'estimator::tests::test_estimator_p12_w6::_4_expects_representation_array_estimate_4_size_24_avg_err_0_0000_' panicked at src/estimator.rs:237:5:
assertion `left == right` failed
  left: "representation: Array(estimate: 4, size: 24), avg_err: 0.0000"
 right: "representation: Array(estimate: 4, size: 24) mem_dbg = 40, avg_err: 0.0000"

---- estimator::tests::test_estimator_p12_w6::_8_expects_representation_array_estimate_8_size_40_avg_err_0_0000_ stdout ----
thread 'estimator::tests::test_estimator_p12_w6::_8_expects_representation_array_estimate_8_size_40_avg_err_0_0000_' panicked at src/estimator.rs:237:5:
assertion `left == right` failed
  left: "representation: Array(estimate: 8, size: 40), avg_err: 0.0000"
 right: "representation: Array(estimate: 8, size: 40) mem_dbg = 40, avg_err: 0.0000"

---- estimator::tests::test_estimator_p12_w6::_64_expects_representation_array_estimate_64_size_264_avg_err_0_0000_ stdout ----
thread 'estimator::tests::test_estimator_p12_w6::_64_expects_representation_array_estimate_64_size_264_avg_err_0_0000_' panicked at src/estimator.rs:237:5:
assertion `left == right` failed
  left: "representation: Array(estimate: 64, size: 264), avg_err: 0.0000"
 right: "representation: Array(estimate: 64, size: 264) mem_dbg = 40, avg_err: 0.0000"

---- estimator::tests::test_estimator_p12_w6::_512_expects_representation_hll_estimate_498_size_3092_avg_err_0_0068_ stdout ----
thread 'estimator::tests::test_estimator_p12_w6::_512_expects_representation_hll_estimate_498_size_3092_avg_err_0_0068_' panicked at src/estimator.rs:237:5:
assertion `left == right` failed
  left: "representation: Hll(estimate: 498, size: 3092), avg_err: 0.0068"
 right: "representation: Hll(estimate: 498, size: 3092) mem_dbg = 40, avg_err: 0.0068"

---- estimator::tests::test_estimator_p12_w6::_4096_expects_representation_hll_estimate_4105_size_3092_avg_err_0_0089_ stdout ----
thread 'estimator::tests::test_estimator_p12_w6::_4096_expects_representation_hll_estimate_4105_size_3092_avg_err_0_0089_' panicked at src/estimator.rs:237:5:
assertion `left == right` failed
  left: "representation: Hll(estimate: 4105, size: 3092), avg_err: 0.0089"
 right: "representation: Hll(estimate: 4105, size: 3092) mem_dbg = 40, avg_err: 0.0089"

---- estimator::tests::test_estimator_p12_w6::_10_000_expects_representation_hll_estimate_10068_size_3092_avg_err_0_0087_ stdout ----
thread 'estimator::tests::test_estimator_p12_w6::_10_000_expects_representation_hll_estimate_10068_size_3092_avg_err_0_0087_' panicked at src/estimator.rs:237:5:
assertion `left == right` failed
  left: "representation: Hll(estimate: 10068, size: 3092), avg_err: 0.0087"
 right: "representation: Hll(estimate: 10068, size: 3092) mem_dbg = 40, avg_err: 0.0087"

---- estimator::tests::test_estimator_p12_w6::_100_000_expects_representation_hll_estimate_95628_size_3092_avg_err_0_0182_ stdout ----
thread 'estimator::tests::test_estimator_p12_w6::_100_000_expects_representation_hll_estimate_95628_size_3092_avg_err_0_0182_' panicked at src/estimator.rs:237:5:
assertion `left == right` failed
  left: "representation: Hll(estimate: 95628, size: 3092), avg_err: 0.0182"
 right: "representation: Hll(estimate: 95628, size: 3092) mem_dbg = 40, avg_err: 0.0182"

Perhaps, cardinality-estimator crate logic should be adjusted on how mem_size is computed or something inside mem-dbg crate should be changed.

bocharov avatar Aug 08 '24 20:08 bocharov

Basically, in most cases the derive is enough to cover everything, and honestly when I opened the PR I had just done that thinking it should be it. Afterwards, as I started benchmarking, I realized that due to the use of the pointer trick to handle dynamic size, the derive told us the struct was only 8 bits or so. Therefore, I quickly wrote up the traits implementations, but I only covered the inner representation. I will try and fix it shortly.

LucaCappelletti94 avatar Aug 08 '24 21:08 LucaCappelletti94

Small update: fixed errors estimating size on your crate side, but also identified an error on the mem dbg side. Working on that now.

LucaCappelletti94 avatar Aug 09 '24 00:08 LucaCappelletti94

Now if you rerun the same script as above, you will find that all estimates are matching.

LucaCappelletti94 avatar Aug 09 '24 12:08 LucaCappelletti94