hello icon indicating copy to clipboard operation
hello copied to clipboard

Local full-text search

Open probonopd opened this issue 3 years ago • 27 comments

Using the following components:

See https://wiki.samba.org/index.php/Spotlight_with_Elasticsearch_Backend for instructions. Also see https://wiki.freebsd.org/Elastic.

Then integrate it into the https://github.com/helloSystem/Menu.

tracker is not an option since it is Gnome, Xdg, D-Bus infested. Too many dependencies on unwelcome technologies.

probonopd avatar Oct 29 '20 23:10 probonopd

How can we get it to create (and use) an index on each writeable removable disk like Macs do with /.Spotlight-V100?

probonopd avatar Oct 30 '20 00:10 probonopd

I recently came across this project: https://github.com/typesense/typesense
That might be what you are looking for to use in the global menu search?

shilch avatar Nov 16 '20 21:11 shilch

Is it suitable to search the contents of random txt, docx, PDF, c++,... files?

Or can it deal only with structured data as https://typesense.org/docs/0.16.1/guide/#create-collection seems to suggest at a quick glance?

probonopd avatar Nov 16 '20 21:11 probonopd

I didn't use it yet. As far as I'm aware, it's for structured data only (e.g. entries in the global menu).

shilch avatar Nov 16 '20 21:11 shilch

entries in the global menu

We can already search those :-)

probonopd avatar Nov 16 '20 22:11 probonopd

Investigate KDE baloo for file indexing and possibly metadata retrieval

https://community.kde.org/Baloo

Baloo is not an application, but a daemon to index files. Applications can use the Baloo framework to provide file search results.

Baloo focuses on providing a very small memory footprint along with with extremely fast searching. It also supports storing additional file based metadata via extended attributes.

FreeBSD:/home/user% balooctl status
Baloo File Indexer is not running
Total files indexed: 0
Files waiting for content indexing: 0
Files failed to index: 0
Current size of index is 12.00 KiB

Trying to enable it prints errors:

FreeBSD:/home/user% balooctl enable
Enabling and starting the File Indexer
FreeBSD:/home/user% QKqueueFileSystemWatcherEngine::addPaths: open: No such file or directory
virtual QStringList Solid::Backends::Hal::HalManager::allDevices()  error:  "org.freedesktop.DBus.Error.ServiceUnknown" 

org.kde.solid.udisks2: Failed enumerating UDisks2 objects: "org.freedesktop.DBus.Error.ServiceUnknown" 
 "The name org.freedesktop.UDisks2 was not provided by any .service files"
org.kde.solid.udisks2: Failed enumerating UDisks2 objects: "org.freedesktop.DBus.Error.ServiceUnknown" 
 "The name org.freedesktop.UDisks2 was not provided by any .service files"

There seems to be a dependency on UDisks2, something we'd like to avoid?

Can this be ignored/avoided?

After that, it says Baloo File Indexer is running and a process /usr/local/bin/baloo_file is running:

FreeBSD:/home/user% balooctl status 
Baloo File Indexer is running
Indexer state: Idle
Total files indexed: 0
Files waiting for content indexing: 0
Files failed to index: 0
Current size of index is 12.00 KiB

Is it doing something? System seems to become less responsive even though CPU usage is not high.

https://community.kde.org/Baloo/Configuration

probonopd avatar Nov 24 '20 21:11 probonopd

For what it's worth, I never had much joy/luck with Baloo on FreeBSD-CURRENT.

Long ago I disabled file search entirely:

image

– this morning I re-enabled both search, and indexing of content.

grahamperrin avatar Dec 26 '20 09:12 grahamperrin

Right now, baloo would be my best bet. Imagine it nicely integrated into the helloSystem global menu search box.

What kind of issues did you experience?

probonopd avatar Dec 26 '20 10:12 probonopd

This morning for example, within minutes or moments of me enabling the feature:

Dec 26 09:27:57 mowa219-gjp4-8570p kernel: pid 8762 (baloo_file), jid 0, uid 1002: exited on signal 6 (core dumped)

Confession: I observed the crashing for years, but never bothered to properly investigate or report it. I might begin to do so (in the FreeBSD area) over the Christmas break.

Postscript(s):

  • https://community.kde.org/Baloo/Configuration#Reindexing offers a use case for balooctl but https://www.freebsd.org/cgi/man.cgi?query=balooctl finds nothing so I assume that things are significantly different on FreeBSD

grahamperrin avatar Dec 26 '20 10:12 grahamperrin

Consider using Drill

Blocked by: https://github.com/yatima1460/Drill/issues/71

It is not full-text search and does not use an index though, and it is currently written in D although a version in C++ may be on the roadmap.

A PyQt GUI around its CLI gives:

image

probonopd avatar Jan 11 '21 18:01 probonopd

Consider using albert

https://albertlauncher.github.io/

It is written in Qt, has a plugin architecture, and does support indexing.

For me it is crashing when I want to invoke it, possibly related to:

11:40:41 [WARN:default] DBus: Name is either invalid, null or not instanceof string
11:40:41 [WARN:default] DBus: CanRaise is either invalid, null or not instanceof bool

One would have to teach it .app bundles and .AppDir directories, and one would have to integrate it with the global menu bar.

probonopd avatar Jan 23 '21 10:01 probonopd

Looks like baloo is working better nowadays, perhaps due to 12.2 rather than 12.1 and newer packages.

In any case, it looks promising!

sudo pkg install kf5-baloo
balooctl enable
balooctl status
# Wait until everything is indexed; does it index only $HOME by default?
baloosearch "FreeBSD Foundation"

probonopd avatar Apr 19 '21 22:04 probonopd

Albert seems to have its own indexing (but it seems to be "only" file name indexing, not full text indexing - a sensible performance tradeoff?), and it's Qt based:

22:02:55 [DEBG:default] Serializing files…
22:02:58 [DEBG:default] Building inverted file index…
22:03:03 [INFO:default] Indexed 171954 files in 67196 directories.
22:11:02 [INFO:default] Start indexing files.
22:11:17 [DEBG:default] Serializing files…
22:11:20 [DEBG:default] Building inverted file index…
22:11:26 [INFO:default] Indexed 171954 files in 67196 directories.

Maybe we can use the code rsponsible for the file indexing and searching and put it into the existing search box in the Menu.


If we wanted to use Albert (rather than porting its Files plugin into our already-existing search in Menu) we would have to write a plugin for Application Bundles, taking code from

https://github.com/helloSystem/Menu/blob/aa2518c4b597b3fa77952ff55a4f638a5df13a60/src/appmenuwidget.cpp#L156-L263

and putting it into

https://github.com/albertlauncher/plugins/blob/ee55048e138028b4889d71e0574e85b2c4d69541/templateExtension/src/extension.cpp#L74


A neat idea is that Albert finds ssh connections and has text snippets.

probonopd avatar Nov 03 '21 21:11 probonopd

Deepin Linux also comes with Global Search. Need to check it out.

probonopd avatar Nov 23 '21 11:11 probonopd

Recoll

Hi @grahamperrin thanks for the hint.

Additional dependencies look reasonable:

New packages to be INSTALLED:
        antiword: 0.37_4 [FreeBSD]
        aspell: 0.60.8_1,1 [FreeBSD]
        catdoc: 0.95 [FreeBSD]
        chmlib: 0.40_1 [FreeBSD]
        gsfonts: 8.11_8 [FreeBSD]
        librevenge: 0.0.4_13 [FreeBSD]
        libwpd010: 0.10.3_4 [FreeBSD]
        p5-Image-ExifTool: 12.00 [FreeBSD]
        pstotext: 1.9_6 [FreeBSD]
        py38-mutagen: 1.42.0_2 [FreeBSD]
        recoll: 1.27.3_15 [FreeBSD]
        unrar: 6.02,6 [FreeBSD]
        unrtf: 0.21.10 [FreeBSD]
        xapian-core: 1.4.18,1 [FreeBSD]

Number of packages to be installed: 14

The process will require 52 MiB more space.
13 MiB to be downloaded.

I will try it out.

Indexing is running:

image

Pros

  • Nice Qt5 GUI
  • Very powerful search. This thing looks awesome
  • System stays usable while indexing is in progress
  • The documentation mentions indexes being stored on network shares and external drives. It would be awesome if we could configure it in a way so that by default all indexes were stored on and picked up from their respective partitions. So that when I plug an external hard disk into another computer, it is already indexed because the index always travels with the partition -- doable?

Cons?

  • Some parts such as /usr/local/bin/perl /usr/local/share/recoll/filters/rclimg seem to depend on Perl, something we wanted to get rid of in helloSystem.

probonopd avatar Nov 28 '21 09:11 probonopd

For real-time indexing: FreeBSD bug 260093 – deskutils/recoll remake X11MON an OPTIONS_DEFAULT

grahamperrin avatar Nov 28 '21 11:11 grahamperrin

What does the the X11MON option do, and do you think we'd need it so that Recoll would be suitable for helloSystem?

probonopd avatar Nov 28 '21 21:11 probonopd

Recent bugs.freebsd.org/bugzilla/show_bug.cgi?id=259679#c3 verified real-time indexing.

If not built with X11MON, then you'll not get real-time indexing.

Compare what you have, with https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=259679#c2.

grahamperrin avatar Dec 01 '21 00:12 grahamperrin

If we could get it working properly, it looks like baloo would be ideal.

System performance comes to a crawl after

balooctl enable

but CPU usage is minimal. Is this I/O bound? Can we throttle its I/O usage?

System performance becomes normal again only after

balooctl disable
balooctl suspend
killall baloo_file # Why is this needed?

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=230726#c14 has the answer:

The system is not freezing, it runs into the vnode limit and there obtaining new vnodes is rate limited to 1 per second, which is arguably rather buggy and should be fixed.

In the meantime you can bump sysctl kern.maxvnodes

https://people.freebsd.org/~amdmi3/handbook/configtuning-kernel-limits.html

To see the current number of vnodes in use:

# sysctl vfs.numvnodes
vfs.numvnodes: 91349

To see the maximum vnodes:

# sysctl kern.maxvnodes
kern.maxvnodes: ...

In my tests,

sudo sysctl kern.maxvnodes=1000000

removed the baloo performance issue. Does this have negative side effects?

Also had to increase the number of allowable open files, I did so by a factor of 10:

sudo sysctl kern.maxfiles=3221930

Without this, I ran into

FreeBSD% balooctl resume
File Indexer resumed

(process:3305): GLib-ERROR **: 09:20:45.190: Creating pipes for GWakeup: Too many open files

Does this have negative side effects?

Now baloo_file is taking up one CPU core while indexing but the system stays operational.

I wonder if we should throttle baloo_file to take at most 50% CPU...

Runs smoothly for a while, but then I get

FreeBSD% QProcessPrivate::createPipe: Cannot create pipe 0x480a2ca100 (Too many open files)

and it does not index anything anymore.

Is there a bug which causes baloo to open but never close files?

Am I hitting https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=256269?

The answer is hopefully not "disable baloo" or "index fewer files", but to get it fixed? It should be fixed in a way that it can index any arbitrary number of files.

probonopd avatar Oct 22 '22 07:10 probonopd

Stopped indexing, then removed old database with rm -rf ~/.local/share/baloo.

Trying with

only basic indexing=true

in ~/.config/baloofilerc to only search file names by now; will this improve it? Yes, this succeeds:

FreeBSD% env LANG=C balooctl status
Baloo File Indexer is running
Indexer state: Idle
Total files indexed: 555,743
Files waiting for content indexing: 0
Files failed to index: 0
Current size of index is 214.31 MiB

So, why does indexing the contents of files lead to too many open files, which in turn leads to the indexing to fail?

probonopd avatar Oct 22 '22 07:10 probonopd

Seems like baloo doesn't have the best reputation among FreeBSD users... perhaps because FreeBSD and baloo are not yet properly "tuned for each other" yet?

https://forums.freebsd.org/threads/problems-with-baloo.80107/

probonopd avatar Oct 22 '22 07:10 probonopd

Integrating the results of

baloosearch -l 100 helloSystem

into Menu:

image

Not too shabby...

probonopd avatar Oct 22 '22 09:10 probonopd

To start the indexing, it must be enabled with balooctl enable. Possibly we will do this at ISO installation time in the future.

probonopd avatar Oct 22 '22 16:10 probonopd

Seems like baloo doesn't have the best reputation among FreeBSD users

Relatively few issues are specific to FreeBSD. Via sysutils/kf5-baloo, in Bugzilla for FreeBSD:

  • two open bugs, one of which blocks the other and has an upstream bug report.

In Bugzilla for KDE, for Baloo, Baloo file daemon, and balooctl:

grahamperrin avatar Oct 23 '22 03:10 grahamperrin

Still seeing

FreeBSD% baloo_file
QKqueueFileSystemWatcherEngine::addPaths: open: No such file or directory

which might mean that we don't get new/changed files indexed immediately. Why?

probonopd avatar Oct 29 '22 12:10 probonopd

https://www.freebsd.org/cgi/man.cgi?rtprio

To make depend while not disturbing other machine usage: idprio 31 make depend

So we might use idprio 31 baloo_file to make it run while not disturbing other machine usage?

probonopd avatar Nov 20 '22 15:11 probonopd