bloom-filter icon indicating copy to clipboard operation
bloom-filter copied to clipboard

Loading from disk fails

Open dare05 opened this issue 8 years ago • 4 comments

Here's the code:

filter = BloomFilter.new size: 100_000, error_rate: 0.00001
100_000.times do
  filter.insert "#{rand(1..1_000_000)}"
end
filter.dump "hii"

Now when I do:

filter = BloomFilter.load "hii"

I get:

in `load': unable to load BloomFilter, expected 299534 but got 420 bytes (StandardError)

I'm supposing this has to do with the operating system, I'm using Windows 7, and it prob has to do with it having 2 ways to load/save files to disk (text/binary mode).

I actually had the same problem with bloomfilter-rb and I fixed it by forcing File.open to write and read in binary mode (it was as easy as appending "b" to the "r" and "w" modes). But here I see the code for serialization is written in C, so that wouldn't be possible.

dare05 avatar Jan 16 '17 20:01 dare05

The reads and writes are done in the C extension (defaults to binary mode). You're dumping to a file named test and reading from a file named hii -- seems incorrect.

deepfryed avatar Jan 16 '17 21:01 deepfryed

@deepfryed Sorry, it was a typo, changed it. So basically, dumping the same file, loading the same file. Same error, just different "got" numbers, it always expects 299534 bytes but sometimes it gets 600 bytes, sometimes 224, 35, 15, 170 etc.

dare05 avatar Jan 16 '17 21:01 dare05

roger, I'll check it out.

deepfryed avatar Jan 16 '17 21:01 deepfryed

I was wrong about bloomfilter-rb btw, even after I changed it to binary, after I loaded the saved file, it always reports 'true' no matter how big of a number I enter (I feed it the same input as the above code, same everything). It may not be the binary format but something totally else...So make sure that after you fix the size-mismatch, you also test for correctness, try testing against a huge number from the loaded file and see what happens.

dare05 avatar Jan 16 '17 21:01 dare05