pkgfile icon indicating copy to clipboard operation
pkgfile copied to clipboard

`pkgfile -b` is slow

Open wpyoga opened this issue 3 years ago • 1 comments

A single invocation of pkgfile -b takes around 230 ms on my laptop (Ryzen 5 4650U, 40GB DDR4, 512GB NVMe SSD).

After playing around with different binary names to search, with binary names at the start of the data files, and at the end of the data files, the invocation time doesn't differ by much. Therefore I'm guessing that most of the time is spent on loading the cache files.

I would like to propose a faster storage format, maybe using a hash index, to bring down the search time.

$ time pkgfile -b abiword
extra/abiword

real	0m0.237s
user	0m0.335s
sys	0m0.036s
$ time for i in {1..10}; do pkgfile -b abiword >/dev/null; done
extra/abiword
extra/abiword
extra/abiword
extra/abiword
extra/abiword
extra/abiword
extra/abiword
extra/abiword
extra/abiword
extra/abiword

real	0m2.280s
user	0m3.185s
sys	0m0.294s

This is my installation:

NAME="Arch Linux"
PRETTY_NAME="Arch Linux"
ID=arch
BUILD_ID=rolling
ANSI_COLOR="38;2;23;147;209"
HOME_URL="https://archlinux.org/"
DOCUMENTATION_URL="https://wiki.archlinux.org/"
SUPPORT_URL="https://bbs.archlinux.org/"
BUG_REPORT_URL="https://bugs.archlinux.org/"
LOGO=archlinux-logo

These are the pkgfile cache files:

$ ls -lh
total 361M
-rw-r--r-- 1 root root 258M Sep 13 19:45 community.files
-rw-r--r-- 1 root root 8.9M Sep 13 19:45 core.files
-rw-r--r-- 1 root root  95M Sep 13 19:45 extra.files

wpyoga avatar Sep 13 '22 13:09 wpyoga

I should do some proper profiling here, but what you're asking for only benefits a small fraction of potential queries. Yes, a hash table could benefit queries like 'pacman' or '/usr/bin/pacman', but you're back to a full scan for anything involving globbing or regex. It does, admittedly, also help with things like #27.

An embedded DB like SQLite might work here to provide a more flexible query language with indexing capabilities. That would be generally useful, but also presents a pretty substantial rewrite of the existing code. I guess that's sort of true of any storage format change...

falconindy avatar Jul 31 '24 10:07 falconindy