heed
heed copied to clipboard
Support duplicate data
Hi thanks for the library it's great! I found the need to efficiently store array types as values so i've modified your library to suport the DUP flags on databases which can efficiently emulate this structure. Please note that I am a very bad rust developer so the way this is done is probably not good but this should be a good starting point for the library! I also removed the mdbx binding because i had bugs with alignments when i tried it so i think it's broken (i mean this generally, it's not broken just for duplicate data the bindings just seem broken).
Currently the only way to store vec's of data is via:
#[derive(Debug, Serialize, Deserialize)]
struct VecStruct {
itemarray: Vec<String>
}
let Database<Str, OwnedType<VecStruct>>
This requires updates to the array to first retreive the vector from lmdb, insert the new elements, then re-insert the vector. This obviously is super slow since it's doing a bunch of allocations and serializing/deserialing.
Instead LMDB supports DUPSORT and DUPFIXED flags! This lets the structure be: let Database<Str, Str> with the initialization code:
pub fn create_tables(&self, env: heed::Env) -> Result<IndexHandle> {
let db: InvSamplesDB = env.create_database(Some("InvSamples"), Some(DBFlags::MdbDupFixed as u32 | DBFlags::MdbDupSort as u32))?;
For efficient iteration i've made the iteror type IterDup which will iterate all duplicate values for a single key (will give all elements of the 'vector' for each key):
let samples_iter = handle.inv_samples.iter_dup_of(rtxn, &key)?;
for sample_entry in samples_iter {
let (key, val) = sample_entry.unwrap();
... do stuff ...
}