libdict
libdict copied to clipboard
Publish on crates.io?
This crate doesn't seem to be published on crates.io?
This crate never passed the implementation draft state, mostly due to lack of time. If development would resume, deploying it to crates.io would be an option. Are you using it?
Are you using it?
I was planning to.
Are you using it? I was planning to. Ok. There are still a few rough edges and the memory-mapped access still needs sorting out. Ideally you could start using it from Git and we fix issues in the API as you encounter them. We would deploy a release afterwards.
@baskerville You've worked on the crate quite a bit. It looks fine to me. I'm not sure whether the user should have the choice to use memory mapped files from the Dictionary struct or whether this should be transparent. But otherwise, we could make a release.
Once #15 and #16 are merged, I think it would be fine to publish from the master branch. I haven't looked thoroughly into the mmap branch yet.
I have rebased the mmap branch, but did not push it yet. I could not decide whether the user of the library should have the choice to use or not to use memory-mapped I/O for file access. I started working on a generic version that would allow both mmaped I/O and normal file access, but this feels like overdoing it. What do you think?
I don't think it is worth trying to encapsulate both approaches.
What are the disadvantages of the mmap approach?
There are two disadvantages when using memory mapped files:
- It requires a virtual address space, potentially a problem for dict servers with many databases on a 32 bit system. My quick calculations however state that this in a non-issue :).
- If the file is deleted or changed (e.g. an upgrade of the database), the corresponding signals need to be catched. I couldn't bother to implement this. But it shouldn't be too hard.
It might be worth writing a benchmark (maybe a randomized lookup of all the terms in the example dictionary?). I'm worried about the performance improvements of mmap being unperceivable.
Won't the memory usage increase when mmap is used?
It might be worth writing a benchmark (maybe
Sorry, I am lacking time for this. It is deemed to be faster since seeking in a file is a system call and hence a privilege switch. In contrast, mapped files get mapped by the kernel transparently and in more than 80 % of all cases, seeking and reading is not a system call and hence dramatically faster.
Won't the memory usage increase when mmap is used?
This is depends on your application and your point of view. On constraint systems this might indeed be the case. However, most modern operating systems cache frequently used files in RAM anyway so this wouldn't be an issue for those. However, when optimising a dict server for the common case, it might desirable to not memory-map infrequently used files. So thanks for the point, the library user should be able to decide which strategy to pick.
Are you willing to pick this up? I've got some code that I could tidy up and upload to the mmap branch or you could start from scratch on your own.
Thanks
If the file is deleted or changed (e.g. an upgrade of the database), the corresponding signals need to be catched. I couldn't bother to implement this. But it shouldn't be too hard.
I don't think this is possible via signals. The only option that I see is using advisory file locks but that requires both parties to try and lock the file before doing anything to it. I'm currently doing it with advisory file locks on my end.
Could you please publish it to crates.io if you accept #18 and #19? @humenda
Could you please publish it to crates.io if you accept #18 and #19? @humenda
Sure.