pyxorfilter icon indicating copy to clipboard operation
pyxorfilter copied to clipboard

Serialization?

Open MattCarothers opened this issue 3 years ago • 1 comments

Is there a way to serialize these filters so they can be stored? Pickle doesn't work, unfortunately.

>>> pickle.dump(filter, open('/tmp/test', 'wb'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: cannot pickle '_cffi_backend.__CDataOwn' object

MattCarothers avatar Oct 08 '21 15:10 MattCarothers

I think this is the issue with pickle because it can't serialze underlying C structs from xorfilter library. Can you try using protobuf?

glitzflitz avatar Oct 19 '21 17:10 glitzflitz

Currently, the underlying library does not directly support serialization (we expect users to do it). We could add it upstream to xor_singleheader if it is convenient. It is basically just a struct, but there is a pointer to a fingerprint array.

For illustration, here is some silly C code where we write the a struct to disk. (It could be better).

    // support that you have a binary_fuse8_t filter and you want to write it to a file in C
    FILE *write_ptr;
    write_ptr = fopen(outputfilename, "wb");
    if (write_ptr == NULL) {
      printf("Cannot write to the output file %s.", outputfilename);
      abort();
    }
    bool isok = true;
    size_t total_bytes = sizeof(filter.Seed) + sizeof(filter.SegmentLength) +
        sizeof(filter.SegmentLengthMask) + sizeof(filter.SegmentCount) +
        sizeof(filter.SegmentCountLength) + sizeof(filter.ArrayLength) +
        sizeof(uint8_t) * filter.ArrayLength;

    isok &= fwrite(&filter.Seed, sizeof(filter.Seed), 1, write_ptr);
    isok &= fwrite(&filter.SegmentLength, sizeof(filter.SegmentLength), 1,
                   write_ptr);
    isok &= fwrite(&filter.SegmentLengthMask, sizeof(filter.SegmentLengthMask),
                   1, write_ptr);
    isok &=
        fwrite(&filter.SegmentCount, sizeof(filter.SegmentCount), 1, write_ptr);
    isok &= fwrite(&filter.SegmentCountLength,
                   sizeof(filter.SegmentCountLength), 1, write_ptr);
    isok &=
        fwrite(&filter.ArrayLength, sizeof(filter.ArrayLength), 1, write_ptr);
    isok &= fwrite(filter.Fingerprints, sizeof(uint8_t) * filter.ArrayLength, 1,
                   write_ptr);
    isok &= (fclose(write_ptr) == 0);
    if (isok) {
      printf("filter data saved to %s. Total bytes = %zu. \n", outputfilename,
             total_bytes);
    } else {
      printf("failed to write filter data to %s.\n", outputfilename);
    }

lemire avatar Jan 05 '23 16:01 lemire

I experienced the same. Found a module Dill which I'm going to try. Don't think it'll be successful though.

opus-x avatar May 01 '23 09:05 opus-x

Any updates on this?

vtsouval avatar Sep 13 '23 15:09 vtsouval

as I think the easiest way to serialise/deserialise is cap'n'proto or something like protobuf, mentioned above. pickle is for Python types

vshuraeff avatar Oct 11 '23 18:10 vshuraeff

I have also try with dill, pickle, and joblib but still not work. "TypeError: cannot pickle '_cffi_backend.__CDataOwn' object" Looking for help solutions for this issue.

ghost avatar Nov 04 '23 00:11 ghost

I have implemented a solution

https://github.com/glitzflitz/pyxorfilter/pull/9

lemire avatar Nov 22 '23 01:11 lemire

Woohoo ! It is fixed.

lemire avatar Nov 30 '23 00:11 lemire