bloom-filter
bloom-filter copied to clipboard
Loading from disk fails
Here's the code:
filter = BloomFilter.new size: 100_000, error_rate: 0.00001
100_000.times do
filter.insert "#{rand(1..1_000_000)}"
end
filter.dump "hii"
Now when I do:
filter = BloomFilter.load "hii"
I get:
in `load': unable to load BloomFilter, expected 299534 but got 420 bytes (StandardError)
I'm supposing this has to do with the operating system, I'm using Windows 7, and it prob has to do with it having 2 ways to load/save files to disk (text/binary mode).
I actually had the same problem with bloomfilter-rb and I fixed it by forcing File.open to write and read in binary mode (it was as easy as appending "b" to the "r" and "w" modes). But here I see the code for serialization is written in C, so that wouldn't be possible.
The reads and writes are done in the C extension (defaults to binary mode). You're dumping to a file named test
and reading from a file named hii
-- seems incorrect.
@deepfryed Sorry, it was a typo, changed it. So basically, dumping the same file, loading the same file. Same error, just different "got" numbers, it always expects 299534 bytes but sometimes it gets 600 bytes, sometimes 224, 35, 15, 170 etc.
roger, I'll check it out.
I was wrong about bloomfilter-rb btw, even after I changed it to binary, after I loaded the saved file, it always reports 'true' no matter how big of a number I enter (I feed it the same input as the above code, same everything). It may not be the binary format but something totally else...So make sure that after you fix the size-mismatch, you also test for correctness, try testing against a huge number from the loaded file and see what happens.