[RFE] Use a real database for storing data
Was there any discussion about whether to use a real database for storing data? I am not familiar with elfeed internals but I believe it should be doable? I don't propose it to be the default way to store data, the current implementation is much better for that because it doesn't have any external dependencies. However, in some (not-so-special) cases it would be useful to have an access to big guns.
From a performance perspective:
- I use elfeed only for a few weeks (and I love it), so my
~/.elfeed/indexis just fine, but from what I've seen in the issue tracker, it can grow a lot and cause some performance issues. - I am afraid that text file database is prune to failures when writing (maybe this is well-handled in the code and it is not possible to introduce inconsistencies or unparseable index file)
Additional values:
- It would allow us to easily query data from outside of Emacs. A simple, yet very useful use-case for this is a possibility to implement an indicator for new articles in your window manager's panel.
- It would allow us to run an unlimited number of complex SQL queries for fetching whatever statistics (I know there is
elfeed-csv-export, so this may not be so attractive benefit) - Sharing elfeed data between multiple computers would be easier than ever
Personally, I would like to use elfeed with SQLite but for some use-cases, it would be useful to run it with PostgreSQL or MariaDB, so the implementation should be abstract enough.
Here's overview of the database:
The Elfeed Database https://nullprogram.com/blog/2013/09/09/
Only a few minor details have changed, so that's essentially still how it works. There are a few things I'd do differently now that I have more experience: 1) do not depend on cl-struct because it turned out to be unreliable, 2) be more flexible about indexing, 3) less denormalization in the file representation by doing more work at load time (which is actually faster), 4) rely more on print-circle (too immature prior to Emacs 24). Basing the index around the avl-tree package, and the content database both turned out to be solid ideas.
The database is written durably if write-region-inhibit-fsync is nil (the default) and Emacs implements it properly (not possible on all platforms), so users need not worry about losing the entire index — the actual important part of the database — due to a power failure. The database is written in its entirety each time it's saved, which is the main factor limiting its scalability. The content database specifically disables fsync for writes for performance reasons, but it's also considered volatile and unimportant.
One of my early plans was to eventually switch the database over to SQLite. I wrote an entire package for that purpose, EmacSQL:
Introducing EmacSQL https://nullprogram.com/blog/2014/02/06/
Ultimately I wasn't satisfied enough with the result, so I didn't use it. I had originally intended to leverage the widely-available "sqlite3" command line program — EmacSQL's design still reflects this — and so the whole thing would be basically transparent to users. It would Just Work. SQLite itself is an incredibly robust and well-tested piece of software, but unfortunately that doesn't apply to the "sqlite3" command line tool, which is just too buggy and unreliable. So I had to write a more robust glue program, but the need to compile this would make it unusable for most users. Not worth it.
Emacs native modules came along later, offering a better interface, but they have exactly the same problem: Nobody's actually figured how to reliably distribute them. That's why there are no native module package with significant use.
pg.el implements the PostgreSQL wire protocol and has no native components. It's well-written and I've only had good experiences with it. But asking users to set up and run a PostgreSQL database for their feed reader is unrealistic. This requires special system privileges, so there's no way to automate it.
So unless some future version of Emacs gains a well-designed, well-written, non-optional, built-in database interface, a "real database" is a practical option.
You make a great point about external queries, something SQLite would do really well. The best we have is elfeed-protocol and elfeed-web which both require going through Emacs. Shared databases is also something I've long wanted to resolve, and, in my own use, I've settled on having independent, per-machine databases, which isn't ideal.
Fortunately the s-exp database has otherwise worked out very well. My oldest database still gets daily use. It's nearly 7 years old, has 86k entries, the index weighs 41MB, and still works great. It's growing slower than Moore's law, so it should be fine for the foreseeable future.
Thank you for the comprehensive answer. I had no idea how complex the topic really is.
First of all, I am adding your blog to my elfeed list, those articles are really useful.
SQLite itself is an incredibly robust and well-tested piece of software, but unfortunately that doesn't apply to the "sqlite3" command line tool, which is just too buggy and unreliable. So I had to write a more robust glue program,
I see. I never needed to programmatically use the sqlite3 command but always used a python library instead, so if you say the tool is unreliable, I believe you. I guess there are no alternatives to that command, right? Do you know if some other programming language support sqlite flawlessly? Maybe we could have a small script in python (or another language) providing an adapter between elfeed and sqlite.
Obviously, this could never replace the current implementation of elfeed database and be the default but it might be an experimental step somewhere.
But asking users to set up and run a PostgreSQL database for their feed reader is unrealistic. This requires special system privileges, so there's no way to automate it.
That's true. But I would guess that there might be some people with their own servers somewhere and already have PostgreSQL running, so it might find some users. We also have containers and docker today, so theoretically running your own PostgreSQL instance locally should be a matter of one or two commands and should not require any special privileges. The question is whether is it worth it to implement - I always imagined that there will be some powerful abstraction already done, that will grant us support for PostgreSQL for free (as a byproduct of implementing support for SQLite).
Maybe we could have a small script in python (or another language) providing an adapter between elfeed and sqlite.
Users will still have to setup the adapter.
We also have containers and docker today, so theoretically running your own PostgreSQL instance locally [...]
You're suggesting users run pg on docker for a feed reader? That will make it unusable for most users.
Maybe I failed to express myself correctly, @andinus. Sorry about that. When I created this RFE, I was hoping to have the possibility to run elfeed with SQLite. Not even having it as a default backend, just a possibility to configure elfeed to use it. Although, I was thinking, that SQLite could be potentially used as a default. Unfortunately, as @skeeto explained, SQLite implementations for Emacs are not reliable, so at this moment, it is just a dream.
Anything other than SQLite that I suggested here was meant to be merely optional or even experimental features, not even close to ever being a default that would affect most users.
Jakub Kadlčík @ 2020-09-02 16:18 IST:
[...] Although, I was thinking, that SQLite could be potentially used as a default. Unfortunately, as @skeeto explained, SQLite implementations for Emacs are not reliable, so at this moment, it is just a dream.
I see, yeah SQLite DB would've been nice.
Anything other than SQLite that I suggested here was meant to be merely optional or even experimental features, not even close to ever being a default that would affect most users.
I see, makes sense.
Hej Hej,
Just to warm up the conversation: With the upcoming emacs 29 and the built in sqlite functionality, are there any thoughts about using SQLite as DB?
As far as I looked at it, it seems the new built in option uses the sqlite3 library and I don't now if the problems mentioned already in this thread are still present or not.
Cheers and thanks for elfeed!
I'm keen to learn whether the emacs 29 sqlite integration can serve this purpose too!!
Now that 29.1 is out, any thoughts about this topic?
AFAIK, skeeto doesn't use Emacs much anymore--at least, not as his editor. I don't know if he still uses it for Elfeed.
Anyway, Someone(TM) "just" needs to start working on a branch that uses the built-in SQLite support now.