duc
duc copied to clipboard
tokyo vs kyoto db backend
# Duc compiled with tokyo db backend.
$ duc info -d /tmp/duc.tokyo.db
Date Time Files Dirs Size Path
2019-07-02 19:22:34 18.6M 1.2M 47.8T /data
# Duc compiled with kyoto db backend.
$ ./duc info -d /tmp/duc.kyoto.db
Date Time Files Dirs Size Path
2019-07-02 19:22:31 18.6M 1.2M 47.8T /data
Duc compiled with kyoto db backend uses https://fallabs.com/kyotocabinet/pkg/kyotocabinet-1.2.77.tar.gz (2018-10-30).
# Sizes of the database
$ ls -lh /tmp/duc.*
-rw-r--r-- 1 ghuls ghuls 233M Jul 2 23:19 /tmp/duc.tokyo.db
-rw-r--r-- 1 ghuls ghuls 219M Jul 2 23:19 /tmp/duc.kyoto.db.kct
According to INSTALL, the tokyo database should be smaller than the kyoto one and that is why it is set as the default db backend. At least in this case it does not seem to be true.
If tokyo and kyoto dbs are similar in size in the general case also, wouldn't it make more sense to set kyoto as default backend db to avoid the problems of corruption which can happen with Tokyocabinet?
When picking a backend you probably need to choose between speed, size and
robustness. Some measurements on my system of a 372G directory with 1.6M files:
----------------------------------
Database Run time Db size
(s) (kB)
----------------------------------
tokyocabinet [*] 8.4 19.2
leveldb 7.1 31.5
sqlite3 13.5 71.1
lmdb 5.9 78.7
kyotocabinet 8.3 26.7
----------------------------------
[*] Tokyocabinet currenty is the default used by Duc because of the good
compression and reasonable performance. A problem is that Tokyocabinet is not
very stable and can create corrupt databases when interrupting the indexing. If
this is a problem for you, choose a different db backend.
A small annoyance with the kyoto db code: It adds .kct to the database name in src/libduc/db-kyoto.c while non of the other databases backends do this. I have to specify /tmp/duc.kyoto.db on the duc command line as database name while the actual db file is called /tmp/duc.kyoto.db.kct (so TAB completion for the DB name does not work properly).
Pull request to fix this: https://github.com/zevv/duc/pull/213
Thanks for the thorough testing.
I planned on changing the default as well, but I'm not sure if kyotodb is available on all platform tokyodb is. I was hoping one of the more mainstream db implementations would one day be on par, but these two are still on top. I dont't have a lot of time these days for testing, but I will keep this issue open until a decision has been made!
It is created by the same company, so I assume they support the same platforms.
I fixed also some parts of the INSTALL documentation: https://github.com/zevv/duc/pull/215