guava icon indicating copy to clipboard operation
guava copied to clipboard

A suggestion of code improvement for BloomFilter.

Open longlong354 opened this issue 1 year ago • 10 comments

API(s)

com.google.common.hash.BloomFilter::optimalNumOfBits(long n, double p)
&
com.google.common.hash.BloomFilter::optimalNumOfHashFunctions(long n, long m)

How do you want it to be improved?

1. use static caculated value of log(2) and squared log(2) :

        private static final double LOG_TWO = Math.log(2);
	private static final double SQUARED_LOG_TWO = Math.pow(LOG_TWO,2);

2. calculate optimalNumOfBits by static values:

	static long optimalNumOfBits(long n, double p) {
	    if (p == 0) {
			p = Double.MIN_VALUE;
	    }
	    return (long) (-n * Math.log(p) / SQUARED_LOG_TWO);
	}

3. caculate optimalNumOfHashFunction by false positive rate(p) directly and using LOG_TWO :

     	static int optimalNumOfHashFunctions(double p){
		return  Math.max(1, (int) Math.round( - Math.log(p) / LOG_TWO));
	}

Why do we need it to be improved?

  1. Slightly reduce the method's execution time.
  2. Refine the calculation of the optimalNumOfHashFunctions method to be more concise and efficient. The current code may lead people to mistakenly believe that the result is related to the number of samples(n) and the number of bits(m), when in fact it is only related to the false positive rate(p).

Example

as the given codes in  “How do you want it to be improved”

Current Behavior

as the source codes in “How do you want it to be improved”

Desired Behavior

as the given codes in com.google.common.hash.BloomFilter::optimalNumOfBits(long n, double p) & com.google.common.hash.BloomFilter::optimalNumOfHashFunctions(long n, long m)

Concrete Use Cases

as "How do you want it to be improved"

Checklist

longlong354 avatar Aug 05 '24 08:08 longlong354