liberapay.com icon indicating copy to clipboard operation
liberapay.com copied to clipboard

The Big Rewrite

Open Changaco opened this issue 3 months ago • 5 comments

This issue is for the rewrite of a significant portion of Liberapay's source code that I first mentioned in https://github.com/liberapay/salon/issues/541#issuecomment-1420578433. The basic idea is to switch from SQL to a new database technology built from scratch in Python. The plan is to close all the technical issues related to the database (namely #245, #762, #980, #1113, #1312, #1595, #1701, #1727, #1736, #1962 and #2010) and take advantage of the migration to fix inaccuracies and inconsistencies in the data.

Sadly the work is still very far from complete. It's a big change and was always going to take a while, but there's also the problem that too much of my time and energy is consumed by other things.

Changaco avatar Mar 23 '24 21:03 Changaco

The basic idea is to switch from SQL to a new database technology built from scratch in Python

Why? I can only see this ending poorly. Databases are immensely difficult to program and people spend PhDs working on them

boehs avatar Mar 28 '24 14:03 boehs

Why?

Because it's painfully slow and difficult to build something correctly when you don't have the right tools. I've been working on this Python+PostgreSQL code base for a decade now, and I've come to the conclusion that SQL isn't the right tool for this job. The work I've done so far on this issue has confirmed that the Liberapay source code can be significantly simplified and improved by replacing SQL queries with Python code.

The new Python module needed to make this work will be significantly simpler than an SQL database, because its job will be simpler. It won't need a query parser and planner, for example. It will fulfill two basic needs: consensus and storage. The consensus part is an implementation of Raft that I've already partly written, from scratch based on the Raft paper. The storage part will be an implementation of well-known data structures. Efficiency will be difficult to achieve with Python, but it might not be necessary to fully optimize the storage layer right from the start.

A distinctive feature of the new API is its intuitive and automatic way of handling the fact that data changes over time. An SQL database can of course store multiple revisions of the same information, but it doesn't know if it's doing that, so for example it can't make it easy to manage how many revisions should be kept or whether to delta-compress them to save space.

Changaco avatar Apr 01 '24 21:04 Changaco

This feels domain specific, which is fair. Efficiency and time for development were my main concerns. Would this replace everything, even stuff like user accounts?

boehs avatar Apr 02 '24 01:04 boehs

I wouldn't say that there's anything “domain specific” in this.

SQL will be completely replaced. Liberapay will no longer require or use a PostgreSQL database.

Changaco avatar Apr 02 '24 09:04 Changaco

Well, I wish you luck in this endeavor

boehs avatar Apr 04 '24 02:04 boehs