pyxorfilter
pyxorfilter copied to clipboard
Serialization?
Is there a way to serialize these filters so they can be stored? Pickle doesn't work, unfortunately.
>>> pickle.dump(filter, open('/tmp/test', 'wb'))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: cannot pickle '_cffi_backend.__CDataOwn' object
I think this is the issue with pickle because it can't serialze underlying C structs from xorfilter library. Can you try using protobuf?
Currently, the underlying library does not directly support serialization (we expect users to do it). We could add it upstream to xor_singleheader if it is convenient. It is basically just a struct, but there is a pointer to a fingerprint array.
For illustration, here is some silly C code where we write the a struct to disk. (It could be better).
// support that you have a binary_fuse8_t filter and you want to write it to a file in C
FILE *write_ptr;
write_ptr = fopen(outputfilename, "wb");
if (write_ptr == NULL) {
printf("Cannot write to the output file %s.", outputfilename);
abort();
}
bool isok = true;
size_t total_bytes = sizeof(filter.Seed) + sizeof(filter.SegmentLength) +
sizeof(filter.SegmentLengthMask) + sizeof(filter.SegmentCount) +
sizeof(filter.SegmentCountLength) + sizeof(filter.ArrayLength) +
sizeof(uint8_t) * filter.ArrayLength;
isok &= fwrite(&filter.Seed, sizeof(filter.Seed), 1, write_ptr);
isok &= fwrite(&filter.SegmentLength, sizeof(filter.SegmentLength), 1,
write_ptr);
isok &= fwrite(&filter.SegmentLengthMask, sizeof(filter.SegmentLengthMask),
1, write_ptr);
isok &=
fwrite(&filter.SegmentCount, sizeof(filter.SegmentCount), 1, write_ptr);
isok &= fwrite(&filter.SegmentCountLength,
sizeof(filter.SegmentCountLength), 1, write_ptr);
isok &=
fwrite(&filter.ArrayLength, sizeof(filter.ArrayLength), 1, write_ptr);
isok &= fwrite(filter.Fingerprints, sizeof(uint8_t) * filter.ArrayLength, 1,
write_ptr);
isok &= (fclose(write_ptr) == 0);
if (isok) {
printf("filter data saved to %s. Total bytes = %zu. \n", outputfilename,
total_bytes);
} else {
printf("failed to write filter data to %s.\n", outputfilename);
}
I experienced the same. Found a module Dill which I'm going to try. Don't think it'll be successful though.
Any updates on this?
as I think the easiest way to serialise/deserialise is cap'n'proto
or something like protobuf
, mentioned above.
pickle
is for Python types
I have also try with dill, pickle, and joblib but still not work. "TypeError: cannot pickle '_cffi_backend.__CDataOwn' object" Looking for help solutions for this issue.
I have implemented a solution
https://github.com/glitzflitz/pyxorfilter/pull/9
Woohoo ! It is fixed.