libbloom
                                
                                 libbloom copied to clipboard
                                
                                    libbloom copied to clipboard
                            
                            
                            
                        about thread safe in multi-thread bloom creation
Hi this is just an observation not a bug.
I'm working with this bloom library but with with several changes.
One of the problems what i notice with the currents branches master  and development is that those both are not thread safe.
Let me explain a little of my case, I'm building a really large bloom filter, in my program and usually the process is slow for one single thread, to speed up this I decide to work with multi-threads but when I test it for my purpose for the first time i notice that some times the program die without reason, after debugging for a time I realize that it need a mutex to protect the read and write of the bloom->bf variable.
After change a little the code i mange to keep the bloom as pthread safe like this...
Add a pthread_mutex_t variable to the internal bloom structure bloom.c:
pthread_mutex_t mutex;
Change the function bloom_check_add to protect the call to the test_bit_set_bit function
	pthread_mutex_lock(&bloom->mutex);
	r = test_bit_set_bit(bloom->bf, x, add);
	pthread_mutex_unlock(&bloom->mutex);
    if (r) {
obviously i add the r variable to keep it outside of the if call
add the thread header
#include <pthread.h>
I know, this make the libbloom OS dependent but there are similar calls from windows.
And yes this was only for let you know about the multi-thread problem I don't know if some other user report it before I search in the closed issues and there is nothing related with it.
Best regards!
Correct, using the same bloom filter (same struct bloom) from multiple threads will require suitable mutexes.
I recommend adding the locks in the client code around accesses to bloom functions, instead of embedding it within bloom.c, that way the same client code can run with a prepackaged (unmodified) libbloom (as available on many distros). Although it does give up some performance on the hash computations.
Enclosing only the test_bit_set_bit() call in the mutex may not be strictly speaking quite thread safe either. If two threads are adding and testing elements with overlapping bits, the adder may set bits that the checker already checked as unset, later returning a miss. Although depending on your use case, this might be a race condition that doesn't matter.
Are you both setting and checking values from concurrent threads, or using multiple threads to construct the bloom filter, so all operations are bloom_add()?
Anyway, I'm very hesitant to add a dependency on pthreads since most consumers, AFAIK, don't need it so prefer to keep the library as simple and dependency-free as possible.
First i fill the bloom with multithreads and then i only check it, also multithread.
But you are rigth other people don't need that. I will handle that outside of the library thanks.
Btw im also handling some checksum outside of the library, did you think to add some function to check some kind of checksum in some future version?
I had been thinking of adding a merge function. Since you create the filter up front before using it, that would be an option. Divide the incoming data into as many chunks as you have CPU cores. Then have a separate thread processing each chunk, creating and populating separate bloom filters. No locking needed, each thread can work independently. When finished, merge them into one bloom filter for the consumer.
Not sure what you mean by the checksum outside library, but if it is something that might be generally useful please file a separate issue with the details about it.
I've added the merge function mentioned above (in the 2.0 development branch, not yet in a release).