dqlite icon indicating copy to clipboard operation
dqlite copied to clipboard

Implement a C client library

Open stgraber opened this issue 4 years ago • 51 comments

Currently our main client is go-dqlite, but it'd be good to have a standard C client library which would allow for porting of traditional sqlite3 codebases over to dqlite. This should somewhat closely follow what's present in sqlite3 but with added functions to setup and communicate with the cluster.

stgraber avatar Apr 21 '21 15:04 stgraber

I'm interested in using dqlite in a project implemented in C on Linux. Now I have a server process that handles RPC from clients, executing various SQL commands on a local database. The clients do not send raw SQL statements, only function numbers with arguments (MsgPack based). This ensures that all accesses can be properly authenticated and traced.

I thought that I could simply link the server to dqlite, add an initialization call, open my database with this special vfs, start three instead of one server, and voilá, replication and scalability. Unfortunately, it seems to be far more complicated. I don't quite grok the documentation, and there seems no simple C based example of this obvious use case.

If I see to it that all clients connect to the leader for write transactions, and to one of the others for read transactions, reacting properly to changes in leadership, wouldn't that render redirecting writes to the leader unnecessary? But how to determine what node is the current leader?

This idea of a C client library sounds like what I've been looking for in vain. I've eyed rqlite, but it's even more Go and must be used via an HTTP connection, so it's even further from what I'm looking for.

So please enlighten me - is this use case possible at all? Would I have to completely revamp my project to use rqlite? Thanks!

w-oertl avatar May 07 '21 21:05 w-oertl

I thought that I could simply link the server to dqlite, add an initialization call, open my database with this special vfs, start three instead of one server, and voilá, replication and scalability.

That's not possible, because as you point below the writes need to go to the leader, so you need some sort of connection/protocol from one of your server to the leader and issue SQL statements via that connection protocol.

Unfortunately, it seems to be far more complicated. I don't quite grok the documentation, and there seems no simple C based example of this obvious use case.

There's such an example in Go (look at the demo program). There reason there's not an example in C is because there's no C client library (i.e. a client that speaks the protocol mention above, that sends SQL statements to the leader).

If I see to it that all clients connect to the leader for write transactions, and to one of the others for read transactions,

At the moment they also need to connect to the leader for read transactions. Support for read transactions on non-leader node would be possible but is not yet implemented.

reacting properly to changes in leadership, wouldn't that render redirecting writes to the leader unnecessary?

I'm not sure what you mean here.

But how to determine what node is the current leader?

Basically a client ask to a node "are you the leader?". If yes, it starts using it to make queries, if not it tries with another node.

This idea of a C client library sounds like what I've been looking for in vain. I've eyed rqlite, but it's even more Go and must be used via an HTTP connection, so it's even further from what I'm looking for.

Right, rqlite is fully in Go.

So please enlighten me - is this use case possible at all? Would I have to completely revamp my project to use rqlite? Thanks!

It seems that you'd need a dqlite C client library indeed, which unfortunately is not implemented at the moment. The client wire protocol is reasonably document, and there is a simplistic C client written to run integration test, but I imagine it would still require a bit of effort to write a more complete client library.

freeekanayaka avatar May 08 '21 18:05 freeekanayaka

Thanks for your insightful explanations. From the existing documentation it's not so easy for me to quickly grasp what the library does, and what it does not. I blindly assumed that all servers could answer SELECT queries. dqlite is really about fault tolerance, and not about performance (yet).

What I mean with "redirection of writes to the leader not being necessary" is that clients communicate with servers via the RPC methods I have described, and can send data modifying requests (currently, all requests) only to the current leader. Therefore a server either answers requests (being the leader), or lets the client know what the leader is, so that the request is resent there by the client. dqlite's protocol wouldn't need to pass SQL statements and results around in this case, I think, and concentrate on raft consensus and WAL frames.

w-oertl avatar May 10 '21 20:05 w-oertl

Thanks for your insightful explanations. From the existing documentation it's not so easy for me to quickly grasp what the library does, and what it does not. I blindly assumed that all servers could answer SELECT queries. dqlite is really about fault tolerance, and not about performance (yet).

That's about right I'd say. Although performance hasn't been a big issue so far for the use cases we had.

What I mean with "redirection of writes to the leader not being necessary" is that clients communicate with servers via the RPC methods I have described, and can send data modifying requests (currently, all requests) only to the current leader. Therefore a server either answers requests (being the leader), or lets the client know what the leader is, so that the request is resent there by the client. dqlite's protocol wouldn't need to pass SQL statements and results around in this case, I think, and concentrate on raft consensus and WAL frames.

I see. That's an interesting point and approach: essentially you'd push the discovery of the leader higher in the stack to the final client. It would be possible to modify dqlite to support that, essentially by exposing some APIs that are now internal, and are the ones currently triggered by the wire protocol. An application would use those directly instead of the wire protocol, with the constraint that they must be executed on the current raft leader, so the application needs to check that first.

freeekanayaka avatar May 11 '21 08:05 freeekanayaka

I'm interested in developing a C library, but i'm not really sure how to start. How can i instantiate some code only with the server side of the dqlite to test my C wire protocol?

joseims avatar Aug 02 '21 20:08 joseims

DRAFT - Work In Progress

This document describes the C API of the dqlite client. The design goal of this API is to stay as close as possible to the design of the SQLite API, to provide a straight-forward and simple path to converting an existing application using SQLite to a, distributed, application using dqlite. The client library will by default run a dqlite server node that takes part in the dqlite cluster.

Remarks

As dqlite makes network calls, every request to the server has the potential to hang an indefinite amount of time. Propose to configure the dqlite object with a default timeout for all networked calls, user configurable.

Alternatives to the networked calls that also take a timeout parameter could also be provided.

TODO

Still need to think about a configurable connect func to connect to dqlite nodes on another machine.

1 Objects

1.1 Database Connection Handle

SQLite

typedef struct sqlite3 sqlite3; Returned by the calls to sqlite3_openXXX.

dqlite

typedef struct dqlite dqlite Represents an open dqlite database object, returned by a call to dqlite_open.

1.2 Prepared Statement Object

SQLite

typedef struct sqlite3_stmt sqlite3_stmt; Created using sqlite3_prepare_v2(), used in sqlite3_bind_* to bind values to parameters and subsequently used by sqlite3_step, sqlite3_reset & sqlite3_finalize.

dqlite

typedef struct dqlite_stmt dqlite_stmt; Created using dqlite_prepare(), used in dqlite_bind_* to bind values to parameters and subsequently used by dqlite_step, dqlite_reset & dqlite_finalize.

2. Functions

2.1 Open

SQLite

int sqlite3_open(
  const char *filename,   /* Database filename (UTF-8) */
  sqlite3 **ppDb          /* OUT: SQLite db handle */
);

int sqlite3_open16(
  const void *filename,   /* Database filename (UTF-16) */
  sqlite3 **ppDb          /* OUT: SQLite db handle */
);

int sqlite3_open_v2(
  const char *filename,   /* Database filename (UTF-8) */
  sqlite3 **ppDb,         /* OUT: SQLite db handle */
  int flags,              /* Flags */
  const char *zVfs        /* Name of VFS module to use */
);

dqlite

/* Extensible struct containing various dqlite settings
 * see https://www.linuxplumbersconf.org/event/7/contributions/657/attachments/639/1159/extensible_syscalls.pdf
 * for explanations around extensible structs.
 * This opts object should configure all there is to configure for the dqlite client & server, before starting it.
 * Some of the possible struct members are mentioned below.
 */
struct dqlite_opts {
    uint64_t size                    /* The size of this struct */
    uint64_t address                 /* pointer to node address, used in `dqlite_node_create` */
    uint64_t bootstrap		     /* 0 or 1 if this node will attempt to bootstrap a new cluster */
    uint64_t cluster                 /* 0 or pointer to array of `dqlite_node_info_ext` structs, a bootstrapping node will use this as initial configuration */
    uint64_t bind_address            /* Where the dqlite node will listen for incoming client requests */
    uint64_t connect_func            /* 0 or pointer to custom function for communication between dqlite nodes, see `dqlite_node_set_connect_func` */
    uint64_t connect_timeout_ms      /* 0 (default used e.g. 5000) or value in ms.
    uint64_t network_latency_ms      /* 0 or estimated latency between nodes */
    uint64_t client_only             /* Whether this node will replicate data or not */
    ...
}

/* Starts a dqlite server node and blocks until the database is ready to accept client request.
 * Returns DQLITE_OK on success.
 */
int dqlite_open(
        const char *path                   /* where dqlite will store it's data and db name to use */
	dqlite **ppDb,                     /* OUT: dqlite db handle */
	const struct dqlite_opts *opts,    /* dqlite options object containing settings, cluster info, ... */
);

2.2 Close

SQLite

int sqlite3_close(sqlite3*);
int sqlite3_close_v2(sqlite3*);

dqlite

/* Closes the database and destroys a dqlite object obtained through a call to dqlite_open.
 * Returns DQLITE_OK on success.
 */
int dqlite_close(dqlite*);

Rationale

2.3 Prepare

SQLite

6 constructors

int sqlite3_prepare(
  sqlite3 *db,            /* Database handle */
  const char *zSql,       /* SQL statement, UTF-8 encoded */
  int nByte,              /* Maximum length of zSql in bytes. */
  sqlite3_stmt **ppStmt,  /* OUT: Statement handle */
  const char **pzTail     /* OUT: Pointer to unused portion of zSql */
);
int sqlite3_prepare_v2(
  sqlite3 *db,            /* Database handle */
  const char *zSql,       /* SQL statement, UTF-8 encoded */
  int nByte,              /* Maximum length of zSql in bytes. */
  sqlite3_stmt **ppStmt,  /* OUT: Statement handle */
  const char **pzTail     /* OUT: Pointer to unused portion of zSql */
);
int sqlite3_prepare_v3(
  sqlite3 *db,            /* Database handle */
  const char *zSql,       /* SQL statement, UTF-8 encoded */
  int nByte,              /* Maximum length of zSql in bytes. */
  unsigned int prepFlags, /* Zero or more SQLITE_PREPARE_ flags */
  sqlite3_stmt **ppStmt,  /* OUT: Statement handle */
  const char **pzTail     /* OUT: Pointer to unused portion of zSql */
);
int sqlite3_prepare16(
  sqlite3 *db,            /* Database handle */
  const void *zSql,       /* SQL statement, UTF-16 encoded */
  int nByte,              /* Maximum length of zSql in bytes. */
  sqlite3_stmt **ppStmt,  /* OUT: Statement handle */
  const void **pzTail     /* OUT: Pointer to unused portion of zSql */
);
int sqlite3_prepare16_v2(
  sqlite3 *db,            /* Database handle */
  const void *zSql,       /* SQL statement, UTF-16 encoded */
  int nByte,              /* Maximum length of zSql in bytes. */
  sqlite3_stmt **ppStmt,  /* OUT: Statement handle */
  const void **pzTail     /* OUT: Pointer to unused portion of zSql */
);
int sqlite3_prepare16_v3(
  sqlite3 *db,            /* Database handle */
  const void *zSql,       /* SQL statement, UTF-16 encoded */
  int nByte,              /* Maximum length of zSql in bytes. */
  unsigned int prepFlags, /* Zero or more SQLITE_PREPARE_ flags */
  sqlite3_stmt **ppStmt,  /* OUT: Statement handle */
  const void **pzTail     /* OUT: Pointer to unused portion of zSql */
);

dqlite

int dqlite_prepare(
  dqlite *db,             /* Database handle */
  const char *zSql,       /* SQL statement, UTF-8 encoded */
  int nByte,              /* Maximum length of zSql in bytes. */
  dqlite_stmt **ppStmt,   /* OUT: Statement handle */
  const char **pzTail     /* OUT: Pointer to unused portion of zSql */
);

Rationale

The SQLite documentation states that dqlite_prepare is legacy, we won't attempt to duplicate it. The prepare16 calls could be added in the future, but will be skipped in the first iteration of dqlite C api. An equivalent of sqlite3_prepare_v3 can be added in the future if there is a user need.

2.4 Finalize

SQLite

int sqlite3_finalize(sqlite3_stmt *pStmt);

dqlite

int dqlite_finalize(dqlite_stmt *pStmt);

2.5 Exec

SQLite

int sqlite3_exec(
  sqlite3*,                                  /* An open database */
  const char *sql,                           /* SQL to be evaluated */
  int (*callback)(void*,int,char**,char**),  /* Callback function */
  void *,                                    /* 1st argument to callback */
  char **errmsg                              /* Error msg written here */
);

dqlite

int dqlite_exec(
  dqlite*,                                   /* An open database */
  const char *sql,                           /* SQL to be evaluated */
  int (*callback)(void*,int,char**,char**),  /* Callback function */
  void *,                                    /* 1st argument to callback */
  char **errmsg                              /* Error msg written here */
);

int dqlite_exec_stmt(
  dqlite*,                                   /* An open database */
  dqlite_stmt *pStmt,                        /* A prepared dqlite statement */
  int (*callback)(void*,int,char**,char**),  /* Callback function */
  void *,                                    /* 1st argument to callback */
  char **errmsg                              /* Error msg written here */
);

Comment

This will require a new dqlite request type, the current request types are roughly Query (client message 6 and 9), which is read-only, doesn't go through raft (if no barrier is required) and returns result rows and Exec (client message 5 and 8), which is not read-only, goes through raft replication and returns a last row id and num rows affected result. The new request type will determine using sqlite3_stmt_readonly if the stmt resulting from the *sql string can modify the DB or not and dispatch the query to the existing Exec and Query handlers, the client can then expect a server message 6 or 7 back depending on the effect of the query.

2.6 Bind

SQLite

int sqlite3_bind_blob(sqlite3_stmt*, int, const void*, int n, void(*)(void*));
int sqlite3_bind_blob64(sqlite3_stmt*, int, const void*, sqlite3_uint64,
                        void(*)(void*));
int sqlite3_bind_double(sqlite3_stmt*, int, double);
int sqlite3_bind_int(sqlite3_stmt*, int, int);
int sqlite3_bind_int64(sqlite3_stmt*, int, sqlite3_int64);
int sqlite3_bind_null(sqlite3_stmt*, int);
int sqlite3_bind_text(sqlite3_stmt*,int,const char*,int,void(*)(void*));
int sqlite3_bind_text16(sqlite3_stmt*, int, const void*, int, void(*)(void*));
int sqlite3_bind_text64(sqlite3_stmt*, int, const char*, sqlite3_uint64,
                         void(*)(void*), unsigned char encoding);
int sqlite3_bind_value(sqlite3_stmt*, int, const sqlite3_value*);
int sqlite3_bind_pointer(sqlite3_stmt*, int, void*, const char*,void(*)(void*));
int sqlite3_bind_zeroblob(sqlite3_stmt*, int, int n);
int sqlite3_bind_zeroblob64(sqlite3_stmt*, int, sqlite3_uint64);

dqlite

int dqlite_bind_blob(dqlite_stmt*, int, const void*, int n, void(*)(void*));
int dqlite_bind_double(dqlite_stmt*, int, double);
int dqlite_bind_int(dqlite_stmt*, int, int);
int dqlite_bind_int64(dqlite_stmt*, int, dqlite_int64);
int dqlite_bind_null(dqlite_stmt*, int);
int dqlite_bind_text(dqlite_stmt*,int,const char*,int,void(*)(void*));

Took the most obvious types here, can add the more exotic ones later on. These calls will construct a list of arguments to build a tuple from wire protocol to be evaluated in context of a prepared stmt.

2.7 Step

SQLite

int sqlite3_step(sqlite3_stmt*);

dqlite

No equivalent in the current implementation of dqlite. Propose to implement dqlite_exec and dqlite_exec_stmt in a first iteration and add equivalents to sqlite3_step, sqlite3_column in the next iteration as dqlite_step will require a new dqlite message.

int dqlite_step(dqlite_stmt*);

2.8 Column

SQLite

const void *sqlite3_column_blob(sqlite3_stmt*, int iCol);
double sqlite3_column_double(sqlite3_stmt*, int iCol);
int sqlite3_column_int(sqlite3_stmt*, int iCol);
sqlite3_int64 sqlite3_column_int64(sqlite3_stmt*, int iCol);
const unsigned char *sqlite3_column_text(sqlite3_stmt*, int iCol);
const void *sqlite3_column_text16(sqlite3_stmt*, int iCol);
sqlite3_value *sqlite3_column_value(sqlite3_stmt*, int iCol);
int sqlite3_column_bytes(sqlite3_stmt*, int iCol);
int sqlite3_column_bytes16(sqlite3_stmt*, int iCol);
int sqlite3_column_type(sqlite3_stmt*, int iCol);

dqlite

No equivalent in the current implementation of dqlite.

const void *dqlite_column_blob(dqlite_stmt*, int iCol);
double dqlite_column_double(dqlite_stmt*, int iCol);
int dqlite_column_int(dqlite_stmt*, int iCol);
dqlite_int64 dqlite_column_int64(dqlite_stmt*, int iCol);
const unsigned char *dqlite_column_text(dqlite_stmt*, int iCol);
int dqlite_column_bytes(dqlite_stmt*, int iCol);
int dqlite_column_type(dqlite_stmt*, int iCol);

2.9 Leader

SQLite

No equivalent

dqlite

/* It's possible that this node does not know who the leader is for
 * various reasons, othewise return id & address of the leader.
 */
int dqlite_leader(
    dqlite*,                /* An open database */
    dqlite_node_info*,      /* leader info written here */
)

2.10 Cluster

SQLite

No equivalent

dqlite

/*  Information about the cluster as perceived by the node with id */
int dqlite_cluster(
    dqlite*,                 /* An open database */ //TODO No open db needed
    node_id, (or address?)   /* Node to send request to */
    dqlite_node_info_ext[],  /* cluster info written here */
    dqlite_node_info_n,      /* size of dqlite_node_info_array */
)

2.11 Add Node

SQLite

No equivalent

dqlite

int dqlite_node_add(
    dqlite*,                 /* An open database */
    dqlite_node_info,        /* Information about the node to be added */
)

2.12 Remove Node

SQLite

No equivalent

dqlite

int dqlite_node_remove(
    dqlite*,                 /* An open database */
    dqlite_node_id,          /* id of the node to be removed */
)

2.13 Assign Role

SQLite

No equivalent

dqlite

int dqlite_node_assign_role(
    dqlite*,                 /* An open database */
    dqlite_node_id,          /* id of the node's role that needs to be changed */
    dqlite_role,             /* desired role of the node */
)

2.13 Describe

SQLite

No equivalent

dqlite

typedef struct dqlite_node_description dqlite_node_description;
/* Description currently contains address|weight|failuredomain*/
int dqlite_node_describe(
    dqlite*,                      /* An open database */ //TODO No open db needed
    dqlite_node_id, (or address?) /* id of the node to be described */
    dqlite_node_description*,     /* OUT: desired role of the node */
)

2.14 Weight

SQLite

No equivalent

dqlite

int dqlite_node_weight(
    dqlite*,                     /* An open database */ //TODO No open db needed
    dqlite_node_id,(or address?) /* id of the node to assign a weight to */
    node_weight ,                /* desired weight of the node */
)

2.15 Weight

SQLite

No equivalent

dqlite

int dqlite_node_failure_domain(
    dqlite*,                  /* An open database */ //TODO No open db needed
    dqlite_node_id (address?) /* id of the node to assign a failure domain to */
    failure_domain,           /* desired failure domain of the node */
)

2.16 Dump

SQLite

No equivalent

dqlite

typedef struct dqlite_db_dump dqlite_db_dump;
/* Dumps the backing sqlite database in the node directory */
int dqlite_dump(
    dqlite*,                  /* An open database */ //TODO No open db needed
    dqlite_db_dump*,          /* OUT: Contains info on the db dump */
)

2.16 Reconfigure

SQLite

No equivalent

dqlite

/* Reconfigures a cluster with the desired configuration */
int dqlite_reconfigure(
    path,                     /* A directory containing the dqlite db */
    dqlite_node_info_ext[]*,  /* Desired configuration of the cluster */
)

3 Example usage

SQLite

#include <sqlite3.h>
#include <stdio.h>

int main(int argc, char* argv[])
{
    sqlite3 *db;
    char *err_msg = 0;

    int rv = sqlite3_open("test.db", &db);
    if (rv != SQLITE_OK) {
        fprintf(stderr, "open database: %s\n", sqlite3_errmsg(db));
        sqlite3_close(db);
        return -1;
    }

    char *sql = "DROP TABLE IF EXISTS Ubuntu;"
                "CREATE TABLE Ubuntu(Id INT, Name TEXT);"
                "INSERT INTO Ubuntu VALUES(1, 'Warty Warthog');"
                "INSERT INTO Ubuntu VALUES(2, 'Hoary Hedgehog');"
                "INSERT INTO Ubuntu VALUES(3, 'Breezy Badger');"
                "INSERT INTO Ubuntu VALUES(4, 'Dapper Drake');";

    rv = sqlite3_exec(db, sql, 0, 0, &err_msg);
    if (rv != SQLITE_OK ) {
        fprintf(stderr, "SQLite exec failure: %s\n", err_msg);
        sqlite3_free(err_msg);
        sqlite3_close(db);
        return -1;
    }

    sqlite3_close(db);
    return 0;
}

dqlite

#include <dqlite.h>
#include <stdio.h>

int main(int argc, char* argv[])
{
    int rv = -1;
    dqlite *db = NULL;
    const char *address = "127.0.0.1:9001"
    char *err_msg = 0;

    struct dqlite_opts opts = {
	.size = sizeof(opts),
	.address = POINTER_TO_UINT64(address),
	.bootstrap = (uint64_t) 1,
	.cluster = (uint64_t)0,
	.bind_address = POINTER_TO_UINT64(address),
	.connect_func = 0,
	.connect_timeout_ms = 0,
	.network_latency_ms = 0,
	.client_only = 0,
    }

    rv = dqlite_open("/tmp/test.db", &db, &opts);
    if (rv != DQLITE_OK) {
        fprintf(stderr, "open database: %s\n", dqlite_errmsg(db));
        dqlite_close(db);
        return -1;
    }

    char *sql = "DROP TABLE IF EXISTS Ubuntu;"
                "CREATE TABLE Ubuntu(Id INT, Name TEXT);"
                "INSERT INTO Ubuntu VALUES(1, 'Warty Warthog');"
                "INSERT INTO Ubuntu VALUES(2, 'Hoary Hedgehog');"
                "INSERT INTO Ubuntu VALUES(3, 'Breezy Badger');"
                "INSERT INTO Ubuntu VALUES(4, 'Dapper Drake');";

    rv = dqlite_exec(db, sql, 0, 0, &err_msg);
    if (rv != DQLITE_OK ) {
        fprintf(stderr, "dqlite exec failure: %s\n", err_msg);
        free(err_msg);
        dqlite_close(db);
        return -1;
    }

    dqlite_close(db);
    return 0;
}

MathieuBordere avatar Oct 21 '21 21:10 MathieuBordere

@tomponline @freeekanayaka @brauner would be great to have your thoughts on the above ^

stgraber avatar Oct 22 '21 03:10 stgraber

Here's a first round of questions/remark:

  1. Why do we need to use uint64_t to hold pointers in the dqlite_opts struct? Can't we use the actual pointer type?
  2. AFAIR sqlite3_exec is just a convenience wrapper around sqlite3_prepare/step/finalize, perhaps it could be just convenience wrapper in dqlite as well, at least initially? (that would mean never using message types 8 and 9, that could be changed later down the road if it becomes a performance bottleneck). See also points 3 and 4.
  3. I find it a bit odd to introduce a new wire protocol message type that conditionally returns a message with type 6 or 7. Instead a I propose to give that responsibility to the client. The implementation of dqlite_prepare could set a flag on the dqlite_stmt object indicating if it's read-only, and the subsequent dqlite_step call would either send a message of type 5 (Exec) or 6 (Query) accordingly.
  4. Is it intentional to leave out the implementation of dqlite_step for the first iteration? It seems there would not be a way to get data out of the database without that? (e.g. with a SELECT). Maybe I'm confused. BTW, the way I'd expect dqlite_step to work would be like described above: if it's a write statement, you would send a message of type 5 (Exec) and wait to get back a message of type 6 (Exec result), if it's a read-only statement things are slightly more involved, you'd send a message of type 6 (Query) the very first time and wait for a message of type 7 (Batch of rows), then subsequent calls to dqlite_step would continue to yield a row at a time from the batch without needing to send any extra message, only when the batch is exhausted and the batch flag indicates there is more data, another message of type 7 would be sent and the process repeated (note that this means we'd have to introduce some state in the dqlite_stmt struct to hold the current batch and also a field to store the index of the last row yielded by dqlite_step).

Hope it's more or less clear what I mean, otherwise just ask me to rephrase :)

freeekanayaka avatar Oct 22 '21 08:10 freeekanayaka

Here's a first round of questions/remark:

1. Why do we need to use `uint64_t` to hold pointers in the `dqlite_opts` struct? Can't we use the actual pointer type?

The extensible struct requires that all fields are 8 bytes wide, I just wanted to have a foolproof way to ensure they would indeed always be 8 bytes wide, maybe @brauner could chime in on this? You think it makes sense to store pointers as uint64_t instead of their native types? On 32 bit systems this would already break, so I think it makes sense to keep uint64_t.

2. AFAIR `sqlite3_exec` is just a convenience wrapper around `sqlite3_prepare/step/finalize`, perhaps it could be just convenience wrapper in dqlite as well, at least initially? (that would mean never using message types `8` and `9`, that could be changed later down the road if it becomes a performance bottleneck). See also points 3 and 4.

It should indeed be a convenience wrapper, I think it makes sense to not use messages 8 & 9 initially.

3. I find it a bit odd to introduce a new wire protocol message type that conditionally returns a message with type `6` or `7`. Instead a I propose to give that responsibility to the client. The implementation of `dqlite_prepare` could set a flag on the `dqlite_stmt` object indicating if it's read-only, and the subsequent `dqlite_step` call would either send a message of type `5` (Exec) or `6` (Query) accordingly.

I reasoned that because sqlite3_stmt_readonly(sqlite3_stmt *pStmt) needs a prepared sqlite statement, it could only be evaluated on the server node that actually knows the sqlite stmt. Otherwise I would need some way to pass the sqlite prepared stmt to the client, which is difficult to do. Another option would be to alter the response message of dqlite_prepare (server message 5) where the server has indicated if the stmt is read-only or not. Altering an existing message might not be possible, so this might require a new server response message type. But maybe I'm missing something here?

4. Is it intentional to leave out the implementation of `dqlite_step` for the first iteration? It seems there would not be a way to get data out of the database without that? (e.g. with a `SELECT`). Maybe I'm confused. 

dqlite_exec would make use of the way sqlite_exec reads its rows, through the callback argument. If the callback function of the 3rd argument to sqlite3_exec() is not NULL, then it is invoked for each result row coming out of the evaluated SQL statements. The 4th argument to sqlite3_exec() is relayed through to the 1st argument of each callback invocation. If the callback pointer to sqlite3_exec() is NULL, then no callback is ever invoked and result rows are ignored. sqlite3_exec

BTW, the way I'd expect dqlite_step to work would be like described above: if it's a write statement, you would send a message of type 5 (Exec) and wait to get back a message of type 6 (Exec result), if it's a read-only statement things are slightly more involved, you'd send a message of type 6 (Query) the very first time and wait for a message of type 7 (Batch of rows), then subsequent calls to dqlite_step would continue to yield a row at a time from the batch without needing to send any extra message, only when the batch is exhausted and the batch flag indicates there is more data, another message of type 7 would be sent and the process repeated (note that this means we'd have to introduce some state in the dqlite_stmt struct to hold the current batch and also a field to store the index of the last row yielded by dqlite_step).

Ok that makes sense, the problem (like in 3) is finding out if the statement is read or write.

Hope it's more or less clear what I mean, otherwise just ask me to rephrase :)

MathieuBordere avatar Oct 22 '21 09:10 MathieuBordere

1. Why do we need to use `uint64_t` to hold pointers in the `dqlite_opts` struct? Can't we use the actual pointer type?

The extensible struct requires that all fields are 8 bytes wide, I just wanted to have a foolproof way to ensure they would indeed always be 8 bytes wide, maybe @brauner could chime in on this? You think it makes sense to store pointers as uint64_t instead of their native types? On 32 bit systems this would already break, so I think it makes sense to keep uint64_t.

I would think that using native types always make things more portable (imagine a 128 bit system, the proposed approach would likely break no?). On 32 bit systems it should work just fine, as long as both the user AND the dqlite library use sizeof() in order to fill/check the .size field.

2. AFAIR `sqlite3_exec` is just a convenience wrapper around `sqlite3_prepare/step/finalize`, perhaps it could be just convenience wrapper in dqlite as well, at least initially? (that would mean never using message types `8` and `9`, that could be changed later down the road if it becomes a performance bottleneck). See also points 3 and 4.

It should indeed be a convenience wrapper, I think it makes sense to not use messages 8 & 9 initially.

Good.

3. I find it a bit odd to introduce a new wire protocol message type that conditionally returns a message with type `6` or `7`. Instead a I propose to give that responsibility to the client. The implementation of `dqlite_prepare` could set a flag on the `dqlite_stmt` object indicating if it's read-only, and the subsequent `dqlite_step` call would either send a message of type `5` (Exec) or `6` (Query) accordingly.

I reasoned that because sqlite3_stmt_readonly(sqlite3_stmt *pStmt) needs a prepared sqlite statement, it could only be evaluated on the server node that actually knows the sqlite stmt. Otherwise I would need some way to pass the sqlite prepared stmt to the client, which is difficult to do. Another option would be to alter the response message of dqlite_prepare (server message 5) where the server has indicated if the stmt is read-only or not. Altering an existing message might not be possible, so this might require a new server response message type. But maybe I'm missing something here?

I understand the concern, but it sounds more an implementation detail rather than something fundamental that should affect the design. I'd suggest having a look at the implementation of the sqlite3_stmt_readonly function in the SQLite code base. I'd expect it to just look at SQL text, perhaps feeding it to the parser first. It shouldn't be hard to implement a "good enough" version of sqlite3_stmt_readonly in the dqlite client. Initially it could be as simple as just checking if the SQL text begins with "SELECT" (case insensitive, stripping leading white spaces), which will cover a lot of ground already. Incidentally, that's kind of what the dqlite shell does as well, see here.

4. Is it intentional to leave out the implementation of `dqlite_step` for the first iteration? It seems there would not be a way to get data out of the database without that? (e.g. with a `SELECT`). Maybe I'm confused. 

dqlite_exec would make use of the way sqlite_exec reads its rows, through the callback argument. If the callback function of the 3rd argument to sqlite3_exec() is not NULL, then it is invoked for each result row coming out of the evaluated SQL statements. The 4th argument to sqlite3_exec() is relayed through to the 1st argument of each callback invocation. If the callback pointer to sqlite3_exec() is NULL, then no callback is ever invoked and result rows are ignored. sqlite3_exec

Ah ok, I didn't remember this part indeed. Fair enough, dqlite_exec would support reading data too. However I'd still recommend implementing it in terms of dqlite_prepare/step/finalize, because 1) eventually you'll want them anyways 2) it feels the callback approach is in many cases more cumbersome to use than a plain prepare/step/finalize dance.

BTW, the way I'd expect dqlite_step to work would be like described above: if it's a write statement, you would send a message of type 5 (Exec) and wait to get back a message of type 6 (Exec result), if it's a read-only statement things are slightly more involved, you'd send a message of type 6 (Query) the very first time and wait for a message of type 7 (Batch of rows), then subsequent calls to dqlite_step would continue to yield a row at a time from the batch without needing to send any extra message, only when the batch is exhausted and the batch flag indicates there is more data, another message of type 7 would be sent and the process repeated (note that this means we'd have to introduce some state in the dqlite_stmt struct to hold the current batch and also a field to store the index of the last row yielded by dqlite_step).

Ok that makes sense, the problem (like in 3) is finding out if the statement is read or write.

Yeah, as I mentioned, I wouldn't consider this as a deal breaker. I described one option above, even augmenting the response message type for the Prepare command is a good option (perhaps even better than client-side one). If you just add a field there are easy ways to do that without breaking compatibility, for example using the revision field of the request, quoting from the docs, in the "Definitions" section at the beginning:

byte 5: Revision number of the message body schema. Within the same protocol version, a new schema revision can only add fields to the previous revision. So a client capable of understanding up to revision N of a certain message schema can still handle messages with revision >N by simply ignoring the extra bytes.

For example you could bump the revision of the Prepare message and if the server detects a high enough revision it will also include the read-only flag in the response message (marking it with a proper revision as well), otherwise it would just return the old format.

freeekanayaka avatar Oct 22 '21 11:10 freeekanayaka

I would think that using native types always make things more portable (imagine a 128 bit system, the proposed approach would likely break no?). On 32 bit systems it should work just fine, as long as both the user AND the dqlite library use sizeof() in order to fill/check the .size field.

Let me think on this.

Ah ok, I didn't remember this part indeed. Fair enough, dqlite_exec would support reading data too. However I'd still recommend implementing it in terms of dqlite_prepare/step/finalize, because 1) eventually you'll want them anyways 2) it feels the callback approach is in many cases more cumbersome to use than a plain prepare/step/finalize dance.

Sounds good.

Yeah, as I mentioned, I wouldn't consider this as a deal breaker. I described one option above, even augmenting the response message type for the Prepare command is a good option (perhaps even better than client-side one). If you just add a field there are easy ways to do that without breaking compatibility, for example using the revision field of the request, quoting from the docs, in the "Definitions" section at the beginning:

byte 5: Revision number of the message body schema. Within the same protocol version, a new schema revision can only add fields to the previous revision. So a client capable of understanding up to revision N of a certain message schema can still handle messages with revision >N by simply ignoring the extra bytes.

For example you could bump the revision of the Prepare message and if the server detects a high enough revision it will also include the read-only flag in the response message (marking it with a proper revision as well), otherwise it would just return the old format.

Great, didn't notice yet the messages were versioned, sounds like the right approach for this.

MathieuBordere avatar Oct 22 '21 11:10 MathieuBordere

I would think that using native types always make things more portable (imagine a 128 bit system, the proposed approach would likely break no?). On 32 bit systems it should work just fine, as long as both the user AND the dqlite library use sizeof() in order to fill/check the .size field.

Let me think on this.

You're right, I think it makes sense to use the native types, should be safe.

MathieuBordere avatar Oct 25 '21 11:10 MathieuBordere

FWIW the other fields could use more specific types as well, e.g.:

struct dqlite_opts {
    size_t size                    /* The size of this struct */
    const char* address                 /* pointer to node address, used in `dqlite_node_create` */
    bool bootstrap		     /* 0 or 1 if this node will attempt to bootstrap a new cluster */
    struct dqlite_node_info_ext* cluster                 /* 0 or pointer to array of `dqlite_node_info_ext` structs, a bootstrapping node will use this as initial configuration */
    const char* bind_address            /* Where the dqlite node will listen for incoming client requests */
    dqlite_connect_func* connect_func            /* 0 or pointer to custom function for communication between dqlite nodes, see `dqlite_node_set_connect_func` */
    unsigned connect_timeout_ms      /* 0 (default used e.g. 5000) or value in ms.
    unsigned network_latency_ms      /* 0 or estimated latency between nodes */
    bool client_only             /* Whether this node will replicate data or not */
    ...
}

(just a sketch)

freeekanayaka avatar Oct 25 '21 14:10 freeekanayaka

Update after comments - DRAFT - Work In Progress

Changes

  • updated struct dqlite_opts to not contain solely uint64_t members
  • added dqlite_connect_opts structs that sets a timeout and connect func for connecting to a dqlite node
  • instead of skipping dqlite_step in a first iteration, add it in the first iteration of the library

This document describes the C API of the dqlite client. The design goal of this API is to stay as close as possible to the design of the SQLite API, to provide a straight-forward and simple path to converting an existing application using SQLite to a, distributed, application using dqlite. The client library will by default run a dqlite server node that takes part in the dqlite cluster.

Remarks

As dqlite makes network calls, every request to the server has the potential to hang an indefinite amount of time. Propose to configure the dqlite database object with a default timeout for all networked calls, user configurable.

Networked calls also take an optional dqlite_connect_opts (extensible?) struct that contains the timeout for the call, an (optional) node_connect_func to connect to the dqlite node (same one as used in dqlite_opts & therefore same as used in dqlite_node_set_connect_func)

struct dqlite_connect_opts {
   size_t size;                             /* The size of this struct */
   uint32_t connect_timeout_ms;             /* Overrides default timeout */
   dqlite_node_connect_func connect_func;   /* NULL or function to connect to a dqlite node, same one as used in `dqlite_opts` */
}

1 Objects

1.1 Database Connection Handle

SQLite

typedef struct sqlite3 sqlite3; Returned by the calls to sqlite3_openXXX.

dqlite

typedef struct dqlite dqlite Represents an open dqlite database object, returned by a call to dqlite_open.

1.2 Prepared Statement Object

SQLite

typedef struct sqlite3_stmt sqlite3_stmt; Created using sqlite3_prepare_v2(), used in sqlite3_bind_* to bind values to parameters and subsequently used by sqlite3_step, sqlite3_reset & sqlite3_finalize.

dqlite

typedef struct dqlite_stmt dqlite_stmt; Created using dqlite_prepare(), used in dqlite_bind_* to bind values to parameters and subsequently used by dqlite_step, dqlite_reset & dqlite_finalize.

2. Functions

2.1 Open

SQLite

int sqlite3_open(
  const char *filename,   /* Database filename (UTF-8) */
  sqlite3 **ppDb          /* OUT: SQLite db handle */
);

int sqlite3_open16(
  const void *filename,   /* Database filename (UTF-16) */
  sqlite3 **ppDb          /* OUT: SQLite db handle */
);

int sqlite3_open_v2(
  const char *filename,   /* Database filename (UTF-8) */
  sqlite3 **ppDb,         /* OUT: SQLite db handle */
  int flags,              /* Flags */
  const char *zVfs        /* Name of VFS module to use */
);

dqlite

/* Extensible struct containing various dqlite settings
 * see https://www.linuxplumbersconf.org/event/7/contributions/657/attachments/639/1159/extensible_syscalls.pdf
 * for explanations around extensible structs.
 * This opts object should configure all there is to configure for the dqlite client & server, before starting it.
 * Some of the possible struct members are mentioned below.
 */
struct dqlite_opts {
    size_t size;                            /* The size of this struct */
    const char* address;                    /* pointer to node address, used in `dqlite_node_create` */
    bool bootstrap;                         /* If this node will attempt to bootstrap a new cluster */
    struct dqlite_node_info_ext *cluster;   /* NULL or pointer to array of `dqlite_node_info_ext` structs, a bootstrapping node will use this as initial configuration */
    unsigned cluster_n;                     /* Number of entries in the cluster array */
    const char* bind_address;               /* Where the dqlite node will listen for incoming client requests */
    dqlite_node_connect_func connect_func;  /* NULL or pointer to custom function for communication between dqlite nodes, see `dqlite_node_set_connect_func` */
    unsigned connect_timeout_ms;            /* 0 (default used e.g. 5000) or value in ms.
    unsigned network_latency_ms;            /* 0 or estimated latency between nodes */
    bool client_only;                       /* Whether this node will replicate data or not */
    ...
}

/* Starts a dqlite server node and blocks until the database is ready to accept client request.
 * Returns DQLITE_OK on success.
 */
int dqlite_open(
        const char *path                   /* where dqlite will store it's data and db name to use */
	dqlite **ppDb,                     /* OUT: dqlite db handle */
	const struct dqlite_opts *opts,    /* dqlite options object containing settings, cluster info, ... */
);

2.2 Close

SQLite

int sqlite3_close(sqlite3*);
int sqlite3_close_v2(sqlite3*);

dqlite

/* Closes the database and destroys a dqlite object obtained through a call to dqlite_open.
 * Returns DQLITE_OK on success.
 */
int dqlite_close(dqlite*);

Rationale

2.3 Prepare

SQLite

6 constructors

int sqlite3_prepare(
  sqlite3 *db,            /* Database handle */
  const char *zSql,       /* SQL statement, UTF-8 encoded */
  int nByte,              /* Maximum length of zSql in bytes. */
  sqlite3_stmt **ppStmt,  /* OUT: Statement handle */
  const char **pzTail     /* OUT: Pointer to unused portion of zSql */
);
int sqlite3_prepare_v2(
  sqlite3 *db,            /* Database handle */
  const char *zSql,       /* SQL statement, UTF-8 encoded */
  int nByte,              /* Maximum length of zSql in bytes. */
  sqlite3_stmt **ppStmt,  /* OUT: Statement handle */
  const char **pzTail     /* OUT: Pointer to unused portion of zSql */
);
int sqlite3_prepare_v3(
  sqlite3 *db,            /* Database handle */
  const char *zSql,       /* SQL statement, UTF-8 encoded */
  int nByte,              /* Maximum length of zSql in bytes. */
  unsigned int prepFlags, /* Zero or more SQLITE_PREPARE_ flags */
  sqlite3_stmt **ppStmt,  /* OUT: Statement handle */
  const char **pzTail     /* OUT: Pointer to unused portion of zSql */
);
int sqlite3_prepare16(
  sqlite3 *db,            /* Database handle */
  const void *zSql,       /* SQL statement, UTF-16 encoded */
  int nByte,              /* Maximum length of zSql in bytes. */
  sqlite3_stmt **ppStmt,  /* OUT: Statement handle */
  const void **pzTail     /* OUT: Pointer to unused portion of zSql */
);
int sqlite3_prepare16_v2(
  sqlite3 *db,            /* Database handle */
  const void *zSql,       /* SQL statement, UTF-16 encoded */
  int nByte,              /* Maximum length of zSql in bytes. */
  sqlite3_stmt **ppStmt,  /* OUT: Statement handle */
  const void **pzTail     /* OUT: Pointer to unused portion of zSql */
);
int sqlite3_prepare16_v3(
  sqlite3 *db,            /* Database handle */
  const void *zSql,       /* SQL statement, UTF-16 encoded */
  int nByte,              /* Maximum length of zSql in bytes. */
  unsigned int prepFlags, /* Zero or more SQLITE_PREPARE_ flags */
  sqlite3_stmt **ppStmt,  /* OUT: Statement handle */
  const void **pzTail     /* OUT: Pointer to unused portion of zSql */
);

dqlite

int dqlite_prepare(
  dqlite *db,               /* Database handle */
  dqlite_connect_opts *opts /* Connect func, timeout, ... */
  const char *zSql,         /* SQL statement, UTF-8 encoded */
  int nByte,                /* Maximum length of zSql in bytes. */
  dqlite_stmt **ppStmt,     /* OUT: Statement handle */
  const char **pzTail       /* OUT: Pointer to unused portion of zSql */
);

Rationale

The SQLite documentation states that dqlite_prepare is legacy, we won't attempt to duplicate it. The prepare16 calls could be added in the future, but will be skipped in the first iteration of dqlite C api. An equivalent of sqlite3_prepare_v3 can be added in the future if there is a user need.

Comment

implementation detail: The dqlite_prepare response will - internally - contain a flag if the query is readonly using sqlite3_stmt_readonly(sqlite3_stmt *pStmt). This allows the client implementation to use the appropriate client message 5 & 6 of the wire protocol.

2.4 Finalize

SQLite

int sqlite3_finalize(sqlite3_stmt *pStmt);

dqlite

int dqlite_finalize(dqlite_stmt *pStmt);

2.5 Exec

SQLite

int sqlite3_exec(
  sqlite3*,                                  /* An open database */
  const char *sql,                           /* SQL to be evaluated */
  int (*callback)(void*,int,char**,char**),  /* Callback function */
  void *,                                    /* 1st argument to callback */
  char **errmsg                              /* Error msg written here */
);

dqlite

int dqlite_exec(
  dqlite*,                                   /* An open database */
  dqlite_connect_opts *opts                  /* Connect func, timeout, ... */
  const char *sql,                           /* SQL to be evaluated */
  int (*callback)(void*,int,char**,char**),  /* Callback function */
  void *,                                    /* 1st argument to callback */
  char **errmsg                              /* Error msg written here */
);

Comment

Convenience wrapper around dqlite_prepare, dqlite_step & dqlite_finalize.

2.6 Bind

SQLite

int sqlite3_bind_blob(sqlite3_stmt*, int, const void*, int n, void(*)(void*));
int sqlite3_bind_blob64(sqlite3_stmt*, int, const void*, sqlite3_uint64,
                        void(*)(void*));
int sqlite3_bind_double(sqlite3_stmt*, int, double);
int sqlite3_bind_int(sqlite3_stmt*, int, int);
int sqlite3_bind_int64(sqlite3_stmt*, int, sqlite3_int64);
int sqlite3_bind_null(sqlite3_stmt*, int);
int sqlite3_bind_text(sqlite3_stmt*,int,const char*,int,void(*)(void*));
int sqlite3_bind_text16(sqlite3_stmt*, int, const void*, int, void(*)(void*));
int sqlite3_bind_text64(sqlite3_stmt*, int, const char*, sqlite3_uint64,
                         void(*)(void*), unsigned char encoding);
int sqlite3_bind_value(sqlite3_stmt*, int, const sqlite3_value*);
int sqlite3_bind_pointer(sqlite3_stmt*, int, void*, const char*,void(*)(void*));
int sqlite3_bind_zeroblob(sqlite3_stmt*, int, int n);
int sqlite3_bind_zeroblob64(sqlite3_stmt*, int, sqlite3_uint64);

dqlite

int dqlite_bind_blob(dqlite_stmt*, int, const void*, int n, void(*)(void*));
int dqlite_bind_double(dqlite_stmt*, int, double);
int dqlite_bind_int(dqlite_stmt*, int, int);
int dqlite_bind_int64(dqlite_stmt*, int, dqlite_int64);
int dqlite_bind_null(dqlite_stmt*, int);
int dqlite_bind_text(dqlite_stmt*,int,const char*,int,void(*)(void*));

Took the most obvious types here, can add the more exotic ones later on. These calls will construct a list of arguments to build a tuple from wire protocol to be evaluated in context of a prepared stmt.

2.7 Step

SQLite

int sqlite3_step(sqlite3_stmt*);

dqlite

int dqlite_step(dqlite_stmt*);

2.8 Column

SQLite

const void *sqlite3_column_blob(sqlite3_stmt*, int iCol);
double sqlite3_column_double(sqlite3_stmt*, int iCol);
int sqlite3_column_int(sqlite3_stmt*, int iCol);
sqlite3_int64 sqlite3_column_int64(sqlite3_stmt*, int iCol);
const unsigned char *sqlite3_column_text(sqlite3_stmt*, int iCol);
const void *sqlite3_column_text16(sqlite3_stmt*, int iCol);
sqlite3_value *sqlite3_column_value(sqlite3_stmt*, int iCol);
int sqlite3_column_bytes(sqlite3_stmt*, int iCol);
int sqlite3_column_bytes16(sqlite3_stmt*, int iCol);
int sqlite3_column_type(sqlite3_stmt*, int iCol);

dqlite

const void *dqlite_column_blob(dqlite_stmt*, int iCol);
double dqlite_column_double(dqlite_stmt*, int iCol);
int dqlite_column_int(dqlite_stmt*, int iCol);
dqlite_int64 dqlite_column_int64(dqlite_stmt*, int iCol);
const unsigned char *dqlite_column_text(dqlite_stmt*, int iCol);
int dqlite_column_bytes(dqlite_stmt*, int iCol);
int dqlite_column_type(dqlite_stmt*, int iCol);

2.9 Leader

SQLite

No equivalent

dqlite

/* It's possible that this node does not know who the leader is for
 * various reasons, othewise return id & address of the leader.
 */
int dqlite_leader(
    char* address,	     /* node address, usable in connect_func of dqlite_connect_opts */
    dqlite_connect_opts* ,   /* NULL or struct with custom connect func to node, timeout, ... */
    dqlite_node_info_ext*,   /* leader info written here */
)

2.10 Cluster

SQLite

No equivalent

dqlite

/*  Information about the cluster as perceived by the node with id */
int dqlite_cluster(
    char* address,	     /* node address, usable in connect_func of dqlite_connect_opts */
    dqlite_connect_opts* ,   /* NULL or struct with custom connect func to node, timeout, ... */
    dqlite_node_info_ext[],  /* cluster info written here */
    dqlite_node_info_n,      /* size of dqlite_node_info_array */
)

2.11 Add Node

SQLite

No equivalent

dqlite

int dqlite_node_add(
    dqlite*,                  /* An open database */
    dqlite_connect_opts*      /* connect func, timeout, ... */
    dqlite_node_info,         /* Information about the node to be added */
)

2.12 Remove Node

SQLite

No equivalent

dqlite

int dqlite_node_remove(
    dqlite*,                 /* An open database */
    dqlite_connect_opts*     /* Connect func, timeout, ... */
    dqlite_node_id,          /* id of the node to be removed */
)

2.13 Assign Role

SQLite

No equivalent

dqlite

int dqlite_node_assign_role(
    dqlite*,                 /* An open database */
    dqlite_connect_opts*     /* Connect func, timeout, ... */
    dqlite_node_id,          /* id of the node's role that needs to be changed */
    dqlite_role,             /* desired role of the node */
)

2.13 Describe

SQLite

No equivalent

dqlite

typedef struct dqlite_node_description dqlite_node_description;
/* Description currently contains address|weight|failuredomain*/
int dqlite_node_describe(
    char* address,		  /* node address, usable in connect_func of dqlite_connect_opts */
    dqlite_connect_opts* ,	  /* NULL or struct with custom connect func to node, timeout, ... */
    dqlite_node_description*,     /* OUT: desired role of the node */
)

2.14 Weight

SQLite

No equivalent

dqlite

int dqlite_node_weight(
    char* address,                /* node address, usable in connect_func of dqlite_connect_opts */
    dqlite_connect_opts* ,        /* NULL or struct with custom connect func to node, timeout, ... */
    node_weight ,                 /* desired weight of the node */
)

2.15 Failure Domain

SQLite

No equivalent

dqlite

int dqlite_node_failure_domain(
    char* address,                /* node address, usable in connect_func of dqlite_connect_opts */
    dqlite_connect_opts* ,        /* NULL or struct with custom connect func to node, timeout, ... */
    failure_domain,               /* desired failure domain of the node */
)

2.16 Dump

SQLite

No equivalent

dqlite

typedef struct dqlite_db_dump dqlite_db_dump;
/* Dumps the backing sqlite database in the node directory */
int dqlite_dump(
    char* address,            /* node address, usable in connect_func of dqlite_connect_opts */
    dqlite_connect_opts* ,    /* NULL or struct with custom connect func to node, timeout, ... */
    dqlite_db_dump*,          /* OUT: Contains info on the db dump */
)

2.16 Reconfigure

SQLite

No equivalent

dqlite

/* Reconfigures a cluster with the desired configuration */
int dqlite_reconfigure(
    path,                     /* A directory containing the dqlite db */
    dqlite_node_info_ext[]*,  /* Desired configuration of the cluster */
)

3 Example usage

3.1 exec

SQLite

#include <sqlite3.h>
#include <stdio.h>

int main(int argc, char* argv[])
{
    sqlite3 *db;
    char *err_msg = NULL;

    int rv = sqlite3_open("test.db", &db);
    if (rv != SQLITE_OK) {
        fprintf(stderr, "open database: %s\n", sqlite3_errmsg(db));
        sqlite3_close(db);
        return -1;
    }

    char *sql = "DROP TABLE IF EXISTS Ubuntu;"
                "CREATE TABLE Ubuntu(Id INT, Name TEXT);"
                "INSERT INTO Ubuntu VALUES(1, 'Warty Warthog');"
                "INSERT INTO Ubuntu VALUES(2, 'Hoary Hedgehog');"
                "INSERT INTO Ubuntu VALUES(3, 'Breezy Badger');"
                "INSERT INTO Ubuntu VALUES(4, 'Dapper Drake');";

    rv = sqlite3_exec(db, sql, 0, 0, &err_msg);
    if (rv != SQLITE_OK ) {
        fprintf(stderr, "SQLite exec failure: %s\n", err_msg);
        sqlite3_free(err_msg);
        sqlite3_close(db);
        return -1;
    }

    sqlite3_close(db);
    return 0;
}

dqlite

#include <dqlite.h>
#include <stdbool.h>
#include <stdio.h>

int main(int argc, char* argv[])
{
    int rv = -1;
    dqlite *db = NULL;
    const char *address = "127.0.0.1:9001"
    char *err_msg = NULL;

    struct dqlite_opts opts = {
	.size = sizeof(opts),
	.address = address,
	.bootstrap = true,
	.cluster = NULL,
	.bind_address = address,
	.connect_func = NULL,
	.connect_timeout_ms = 0,
	.network_latency_ms = 0,
	.client_only = false,
    }

    struct dqlite_connect_opts conn_opts = {
	.size = sizeof(conn_opts),
        .connect_func = NULL,
        .timeout = 0,
    }

    rv = dqlite_open("/tmp/test.db", &db, &opts);
    if (rv != DQLITE_OK) {
        fprintf(stderr, "open database: %s\n", dqlite_errmsg(db));
        dqlite_close(db);
        return -1;
    }

    char *sql = "DROP TABLE IF EXISTS Ubuntu;"
                "CREATE TABLE Ubuntu(Id INT, Name TEXT);"
                "INSERT INTO Ubuntu VALUES(1, 'Warty Warthog');"
                "INSERT INTO Ubuntu VALUES(2, 'Hoary Hedgehog');"
                "INSERT INTO Ubuntu VALUES(3, 'Breezy Badger');"
                "INSERT INTO Ubuntu VALUES(4, 'Dapper Drake');";

    rv = dqlite_exec(db, &conn_opts, sql, 0, 0, &err_msg);
    if (rv != DQLITE_OK ) {
        fprintf(stderr, "dqlite exec failure: %s\n", err_msg);
        free(err_msg);
        dqlite_close(db);
        return -1;
    }

    dqlite_close(db);
    return 0;
}

MathieuBordere avatar Oct 25 '21 15:10 MathieuBordere

Looks better to me!

I'm a bit confused by the dqlite_connect_opts argument that dqlite_exec requires. Why is it required only by dqlite_exec and not by, say, dqlite_step? It's also a bit weird that dqlite_open does not require dqlite_connect_opts AFAICS.

In general, having to pass connection options to basically all calls seems a bit verbose. I'm not totally sure, but I would expect most of the times you just want to use the very same values for those options all the time (e.g. timeout and connect function). Perhaps we should pass those options just to the initial dqlite_open call and then save them in the dqlite object? There could be a special function to change them after the dqlite object has been created, if deemed useful. So perhaps dqlite_opts and dqlite_connect_opts could be merged and only passed to dqlite_open.

freeekanayaka avatar Oct 25 '21 15:10 freeekanayaka

Hm I guess the reason why dqlite_step doesn't take dqlite_connect_opts is that it's passed to dqlite_prepare and perhaps saved into the dqlite_stmt object. Still, I'd consider passing those knobs only to the initial dqlite_open.

freeekanayaka avatar Oct 25 '21 16:10 freeekanayaka

I agree it's verbose, and that those 2 structs don't feel simple & clean, I didn't merge dqlite_opts and dqlite_connect_opts because some functions don't require an open database e.g. dqlite_node_describe. In the end, most of the regular applications will always call dqlite_open creating the dqlite object that could save the connection options. I just wanted to keep some flexibility. Agree that if we keep 2 structs, for consistency's sake, dqlite_open and dqlite_step should accept them too.

MathieuBordere avatar Oct 25 '21 16:10 MathieuBordere

There's also this client_only flag in the dqlite_opts object whose semantics I haven't really defined yet. If set it could create a dqlite object whose creation doesn't block and can then be used in calls like dqlite_node_describe, in that case we can drop that dqlite_connect_opts struct I think.

MathieuBordere avatar Oct 25 '21 16:10 MathieuBordere

Update after comments - DRAFT - Work In Progress

Changes

  • removed dqlite_connect_opts

This document describes the C API of the dqlite client. The design goal of this API is to stay as close as possible to the design of the SQLite API, to provide a straight-forward and simple path to converting an existing application using SQLite to a, distributed, application using dqlite. The client library will by default run a dqlite server node that takes part in the dqlite cluster.

Remarks

As dqlite makes network calls, every request to the server has the potential to hang an indefinite amount of time. Propose to configure the dqlite database object with a default timeout for all networked calls, user configurable.

1 Objects

1.1 Database Connection Handle

SQLite

typedef struct sqlite3 sqlite3; Returned by the calls to sqlite3_openXXX.

dqlite

typedef struct dqlite dqlite Represents an open dqlite database object, returned by a call to dqlite_open.

1.2 Prepared Statement Object

SQLite

typedef struct sqlite3_stmt sqlite3_stmt; Created using sqlite3_prepare_v2(), used in sqlite3_bind_* to bind values to parameters and subsequently used by sqlite3_step, sqlite3_reset & sqlite3_finalize.

dqlite

typedef struct dqlite_stmt dqlite_stmt; Created using dqlite_prepare(), used in dqlite_bind_* to bind values to parameters and subsequently used by dqlite_step, dqlite_reset & dqlite_finalize.

2. Functions

2.1 Open

SQLite

int sqlite3_open(
  const char *filename,   /* Database filename (UTF-8) */
  sqlite3 **ppDb          /* OUT: SQLite db handle */
);

int sqlite3_open16(
  const void *filename,   /* Database filename (UTF-16) */
  sqlite3 **ppDb          /* OUT: SQLite db handle */
);

int sqlite3_open_v2(
  const char *filename,   /* Database filename (UTF-8) */
  sqlite3 **ppDb,         /* OUT: SQLite db handle */
  int flags,              /* Flags */
  const char *zVfs        /* Name of VFS module to use */
);

dqlite

/* Extensible struct containing various dqlite settings
 * see https://www.linuxplumbersconf.org/event/7/contributions/657/attachments/639/1159/extensible_syscalls.pdf
 * for explanations around extensible structs.
 * This opts object should configure all there is to configure for the dqlite client & server, before starting it.
 * Some of the possible struct members are mentioned below.
 */
struct dqlite_opts {
    size_t size;                            /* The size of this struct */
    const char* address;                    /* pointer to node address, used in `dqlite_node_create` */
    bool bootstrap;                         /* If this node will attempt to bootstrap a new cluster */
    struct dqlite_node_info_ext *cluster;   /* NULL or pointer to array of `dqlite_node_info_ext` structs, a bootstrapping node will use this as initial configuration */
    unsigned cluster_n;                     /* Number of entries in the cluster array */
    const char* bind_address;               /* Where the dqlite node will listen for incoming client requests */
    dqlite_node_connect_func connect_func;  /* NULL or pointer to custom function for communication between dqlite nodes, see `dqlite_node_set_connect_func` */
    unsigned connect_timeout_ms;            /* 0 (default used e.g. 5000) or value in ms.
    unsigned network_latency_ms;            /* 0 or estimated latency between nodes */
    bool client_only;                       /* Whether this node will replicate data or not */
    ...
}

/* Starts a dqlite server node and blocks until the database is ready to accept client request.
 * Returns DQLITE_OK on success.
 */
int dqlite_open(
        const char *path                   /* where dqlite will store it's data and db name to use */
	dqlite **ppDb,                     /* OUT: dqlite db handle */
	const struct dqlite_opts *opts,    /* dqlite options object containing settings, cluster info, ... */
);

2.2 Close

SQLite

int sqlite3_close(sqlite3*);
int sqlite3_close_v2(sqlite3*);

dqlite

/* Closes the database and destroys a dqlite object obtained through a call to dqlite_open.
 * Returns DQLITE_OK on success.
 */
int dqlite_close(dqlite*);

Rationale

2.3 Prepare

SQLite

6 constructors

int sqlite3_prepare(
  sqlite3 *db,            /* Database handle */
  const char *zSql,       /* SQL statement, UTF-8 encoded */
  int nByte,              /* Maximum length of zSql in bytes. */
  sqlite3_stmt **ppStmt,  /* OUT: Statement handle */
  const char **pzTail     /* OUT: Pointer to unused portion of zSql */
);
int sqlite3_prepare_v2(
  sqlite3 *db,            /* Database handle */
  const char *zSql,       /* SQL statement, UTF-8 encoded */
  int nByte,              /* Maximum length of zSql in bytes. */
  sqlite3_stmt **ppStmt,  /* OUT: Statement handle */
  const char **pzTail     /* OUT: Pointer to unused portion of zSql */
);
int sqlite3_prepare_v3(
  sqlite3 *db,            /* Database handle */
  const char *zSql,       /* SQL statement, UTF-8 encoded */
  int nByte,              /* Maximum length of zSql in bytes. */
  unsigned int prepFlags, /* Zero or more SQLITE_PREPARE_ flags */
  sqlite3_stmt **ppStmt,  /* OUT: Statement handle */
  const char **pzTail     /* OUT: Pointer to unused portion of zSql */
);
int sqlite3_prepare16(
  sqlite3 *db,            /* Database handle */
  const void *zSql,       /* SQL statement, UTF-16 encoded */
  int nByte,              /* Maximum length of zSql in bytes. */
  sqlite3_stmt **ppStmt,  /* OUT: Statement handle */
  const void **pzTail     /* OUT: Pointer to unused portion of zSql */
);
int sqlite3_prepare16_v2(
  sqlite3 *db,            /* Database handle */
  const void *zSql,       /* SQL statement, UTF-16 encoded */
  int nByte,              /* Maximum length of zSql in bytes. */
  sqlite3_stmt **ppStmt,  /* OUT: Statement handle */
  const void **pzTail     /* OUT: Pointer to unused portion of zSql */
);
int sqlite3_prepare16_v3(
  sqlite3 *db,            /* Database handle */
  const void *zSql,       /* SQL statement, UTF-16 encoded */
  int nByte,              /* Maximum length of zSql in bytes. */
  unsigned int prepFlags, /* Zero or more SQLITE_PREPARE_ flags */
  sqlite3_stmt **ppStmt,  /* OUT: Statement handle */
  const void **pzTail     /* OUT: Pointer to unused portion of zSql */
);

dqlite

int dqlite_prepare(
  dqlite *db,               /* Database handle */
  const char *zSql,         /* SQL statement, UTF-8 encoded */
  int nByte,                /* Maximum length of zSql in bytes. */
  dqlite_stmt **ppStmt,     /* OUT: Statement handle */
  const char **pzTail       /* OUT: Pointer to unused portion of zSql */
);

Rationale

The SQLite documentation states that dqlite_prepare is legacy, we won't attempt to duplicate it. The prepare16 calls could be added in the future, but will be skipped in the first iteration of dqlite C api. An equivalent of sqlite3_prepare_v3 can be added in the future if there is a user need.

Comment

implementation detail: The dqlite_prepare response will - internally - contain a flag if the query is readonly using sqlite3_stmt_readonly(sqlite3_stmt *pStmt). This allows the client implementation to use the appropriate client message 5 & 6 of the wire protocol.

2.4 Finalize

SQLite

int sqlite3_finalize(sqlite3_stmt *pStmt);

dqlite

int dqlite_finalize(dqlite_stmt *pStmt);

2.5 Exec

SQLite

int sqlite3_exec(
  sqlite3*,                                  /* An open database */
  const char *sql,                           /* SQL to be evaluated */
  int (*callback)(void*,int,char**,char**),  /* Callback function */
  void *,                                    /* 1st argument to callback */
  char **errmsg                              /* Error msg written here */
);

dqlite

int dqlite_exec(
  dqlite*,                                   /* An open database */
  const char *sql,                           /* SQL to be evaluated */
  int (*callback)(void*,int,char**,char**),  /* Callback function */
  void *,                                    /* 1st argument to callback */
  char **errmsg                              /* Error msg written here */
);

Comment

Convenience wrapper around dqlite_prepare, dqlite_step & dqlite_finalize.

2.6 Bind

SQLite

int sqlite3_bind_blob(sqlite3_stmt*, int, const void*, int n, void(*)(void*));
int sqlite3_bind_blob64(sqlite3_stmt*, int, const void*, sqlite3_uint64,
                        void(*)(void*));
int sqlite3_bind_double(sqlite3_stmt*, int, double);
int sqlite3_bind_int(sqlite3_stmt*, int, int);
int sqlite3_bind_int64(sqlite3_stmt*, int, sqlite3_int64);
int sqlite3_bind_null(sqlite3_stmt*, int);
int sqlite3_bind_text(sqlite3_stmt*,int,const char*,int,void(*)(void*));
int sqlite3_bind_text16(sqlite3_stmt*, int, const void*, int, void(*)(void*));
int sqlite3_bind_text64(sqlite3_stmt*, int, const char*, sqlite3_uint64,
                         void(*)(void*), unsigned char encoding);
int sqlite3_bind_value(sqlite3_stmt*, int, const sqlite3_value*);
int sqlite3_bind_pointer(sqlite3_stmt*, int, void*, const char*,void(*)(void*));
int sqlite3_bind_zeroblob(sqlite3_stmt*, int, int n);
int sqlite3_bind_zeroblob64(sqlite3_stmt*, int, sqlite3_uint64);

dqlite

int dqlite_bind_blob(dqlite_stmt*, int, const void*, int n, void(*)(void*));
int dqlite_bind_double(dqlite_stmt*, int, double);
int dqlite_bind_int(dqlite_stmt*, int, int);
int dqlite_bind_int64(dqlite_stmt*, int, dqlite_int64);
int dqlite_bind_null(dqlite_stmt*, int);
int dqlite_bind_text(dqlite_stmt*,int,const char*,int,void(*)(void*));

Took the most obvious types here, can add the more exotic ones later on. These calls will construct a list of arguments to build a tuple from wire protocol to be evaluated in context of a prepared stmt.

2.7 Step

SQLite

int sqlite3_step(sqlite3_stmt*);

dqlite

int dqlite_step(dqlite_stmt*);

2.8 Column

SQLite

const void *sqlite3_column_blob(sqlite3_stmt*, int iCol);
double sqlite3_column_double(sqlite3_stmt*, int iCol);
int sqlite3_column_int(sqlite3_stmt*, int iCol);
sqlite3_int64 sqlite3_column_int64(sqlite3_stmt*, int iCol);
const unsigned char *sqlite3_column_text(sqlite3_stmt*, int iCol);
const void *sqlite3_column_text16(sqlite3_stmt*, int iCol);
sqlite3_value *sqlite3_column_value(sqlite3_stmt*, int iCol);
int sqlite3_column_bytes(sqlite3_stmt*, int iCol);
int sqlite3_column_bytes16(sqlite3_stmt*, int iCol);
int sqlite3_column_type(sqlite3_stmt*, int iCol);

dqlite

const void *dqlite_column_blob(dqlite_stmt*, int iCol);
double dqlite_column_double(dqlite_stmt*, int iCol);
int dqlite_column_int(dqlite_stmt*, int iCol);
dqlite_int64 dqlite_column_int64(dqlite_stmt*, int iCol);
const unsigned char *dqlite_column_text(dqlite_stmt*, int iCol);
int dqlite_column_bytes(dqlite_stmt*, int iCol);
int dqlite_column_type(dqlite_stmt*, int iCol);

2.9 Leader

SQLite

No equivalent

dqlite

/* It's possible that this node does not know who the leader is for
 * various reasons, othewise return id & address of the leader.
 * When `address` is NULL, the client will attempt to contact all cluster nodes
 * until it has identified the leader.
 */
int dqlite_leader(
    dqlite*,                 /* An open database */
    char* address,	     /* target node's address or NULL. */
    dqlite_node_info_ext*,   /* OUT: leader info written here */
)

2.10 Cluster

SQLite

No equivalent

dqlite

/* Information about the cluster as perceived by the target node.
 * When `address` is NULL, the client will attempt to contact all cluster nodes
 * until it has identified the leader and ask the leader about the cluster information.
 */
int dqlite_cluster(
    dqlite*,                  /* An open database */
    char* address,	      /* target node's address or NULL */
    dqlite_node_info_ext[],   /* OUT: cluster info written here */
    dqlite_node_info_n*,      /* OUT: size of dqlite_node_info_array */
)

2.11 Add Node

SQLite

No equivalent

dqlite

int dqlite_node_add(
    dqlite*,                  /* An open database */
    dqlite_node_info,         /* Information about the node to be added */
)

2.12 Remove Node

SQLite

No equivalent

dqlite

int dqlite_node_remove(
    dqlite*,                 /* An open database */
    dqlite_node_id,          /* id of the node to be removed */
)

2.13 Assign Role

SQLite

No equivalent

dqlite

int dqlite_node_assign_role(
    dqlite*,                 /* An open database */
    dqlite_node_id,          /* id of the node's role that needs to be changed */
    dqlite_role,             /* desired role of the node */
)

2.13 Describe

SQLite

No equivalent

dqlite

typedef struct dqlite_node_description dqlite_node_description;
/* Description currently contains address|weight|failuredomain*/
int dqlite_node_describe(
    dqlite*,                      /* An open database */
    char* address,		  /* target node's address */
    dqlite_node_description*,     /* OUT: description of the node */
)

2.14 Weight

SQLite

No equivalent

dqlite

int dqlite_node_weight(
    dqlite*,                      /* An open database */
    char* address,                /* target node's address */
    node_weight ,                 /* desired weight of the node */
)

2.15 Failure Domain

SQLite

No equivalent

dqlite

int dqlite_node_failure_domain(
    dqlite*,                      /* An open database */
    char* address,                /* target node's address */
    failure_domain,               /* desired failure domain of the node */
)

2.16 Dump

SQLite

No equivalent

dqlite

typedef struct dqlite_db_dump dqlite_db_dump;
/* Dumps the backing sqlite database in `path` or in the node directory
 * when `path` is NULL.
 */
int dqlite_dump(
    dqlite*,                  /* An open database */
    const char* address,      /* target node's address */
    const char* path,         /* NULL or Directory to write dump to */
    dqlite_db_dump*,          /* OUT: Contains info on the db dump */
)

2.16 Reconfigure

SQLite

No equivalent

dqlite

/* Reconfigures a cluster with the desired configuration */
int dqlite_reconfigure(
    path,                     /* A directory containing the dqlite db */
    dqlite_node_info_ext[]*,  /* Desired configuration of the cluster */
)

3 Example usage

3.1 exec

SQLite

#include <sqlite3.h>
#include <stdio.h>

int main(int argc, char* argv[])
{
    sqlite3 *db;
    char *err_msg = NULL;

    int rv = sqlite3_open("test.db", &db);
    if (rv != SQLITE_OK) {
        fprintf(stderr, "open database: %s\n", sqlite3_errmsg(db));
        sqlite3_close(db);
        return -1;
    }

    char *sql = "DROP TABLE IF EXISTS Ubuntu;"
                "CREATE TABLE Ubuntu(Id INT, Name TEXT);"
                "INSERT INTO Ubuntu VALUES(1, 'Warty Warthog');"
                "INSERT INTO Ubuntu VALUES(2, 'Hoary Hedgehog');"
                "INSERT INTO Ubuntu VALUES(3, 'Breezy Badger');"
                "INSERT INTO Ubuntu VALUES(4, 'Dapper Drake');";

    rv = sqlite3_exec(db, sql, 0, 0, &err_msg);
    if (rv != SQLITE_OK ) {
        fprintf(stderr, "SQLite exec failure: %s\n", err_msg);
        sqlite3_free(err_msg);
        sqlite3_close(db);
        return -1;
    }

    sqlite3_close(db);
    return 0;
}

dqlite

#include <dqlite.h>
#include <stdbool.h>
#include <stdio.h>

int main(int argc, char* argv[])
{
    int rv = -1;
    dqlite *db = NULL;
    const char *address = "127.0.0.1:9001"
    char *err_msg = NULL;

    struct dqlite_opts opts = {
	.size = sizeof(opts),
	.address = address,
	.bootstrap = true,
	.cluster = NULL,
	.bind_address = address,
	.connect_func = NULL,
	.connect_timeout_ms = 0,
	.network_latency_ms = 0,
	.client_only = false,
    }

    rv = dqlite_open("/tmp/test.db", &db, &opts);
    if (rv != DQLITE_OK) {
        fprintf(stderr, "open database: %s\n", dqlite_errmsg(db));
        dqlite_close(db);
        return -1;
    }

    char *sql = "DROP TABLE IF EXISTS Ubuntu;"
                "CREATE TABLE Ubuntu(Id INT, Name TEXT);"
                "INSERT INTO Ubuntu VALUES(1, 'Warty Warthog');"
                "INSERT INTO Ubuntu VALUES(2, 'Hoary Hedgehog');"
                "INSERT INTO Ubuntu VALUES(3, 'Breezy Badger');"
                "INSERT INTO Ubuntu VALUES(4, 'Dapper Drake');";

    rv = dqlite_exec(db, sql, 0, 0, &err_msg);
    if (rv != DQLITE_OK ) {
        fprintf(stderr, "dqlite exec failure: %s\n", err_msg);
        free(err_msg);
        dqlite_close(db);
        return -1;
    }

    dqlite_close(db);
    return 0;
}

MathieuBordere avatar Oct 25 '21 18:10 MathieuBordere

Ah ok, I didn't realize that some of the APIs don't actually need an open database and a dqlite object, good point. So yeah, the situation is very similar to the one in the Go client, where the non-database stuff lives under /client and the db stuff under /driver and code is shared mainly via the /internal/protocol package. Perhaps we could do the same here?

Below you find a possible design. Note that I've also dropped the "automatic node creation" functionality from dqlite_open, as it feels a separate concern. We could have higher-level helper functions for that (a bit like the /app package in the Go client), or even a full blown dqlite_app object with its own options, state and automatic maintenance functionality.

I understand these changes make the whole thing a bit more verbose, but the benefit is separation of concerns and evolvability.

I didn't rework the example code to take into account this. I'd just like to see what you guys think. When higher-level helpers are designed and in place as well, things should look way less verbose.

Update after comments - DRAFT - Work In Progress

Changes

  • added a dqlite_transport object for network connect/read/write
  • added a dqlite_client object
  • separate concerns of dqlite_open and node creation

This document describes the C API of the dqlite client. The design goal of this API is to stay as close as possible to the design of the SQLite API, to provide a straight-forward and simple path to converting an existing application using SQLite to a, distributed, application using dqlite. The client library will by default run a dqlite server node that takes part in the dqlite cluster.

Remarks

As dqlite makes network calls, every request to the server has the potential to hang an indefinite amount of time. Propose to configure the dqlite database object with a default timeout for all networked calls, user configurable.

1 Objects

1.1 Database Connection Handle

SQLite

typedef struct sqlite3 sqlite3; Returned by the calls to sqlite3_openXXX.

dqlite

typedef struct dqlite dqlite Represents an open dqlite database object, returned by a call to dqlite_open. It knows about all cluster addresses and automatically finds/retries with a new leader in case the old leader gets deposed.

typedef struct dqlite_client dqlite_client Represents an open dqlite client connection, returned by a call to dqlite_client_open. This is used to perform cluster-level operations that don't require an open database. It connects directly to a single node. If the operation to be performed requires the connected node to be the leader, but the node is not the leader, the operation will return an error. If required, it's the application's responsibility to find the leader with dqlite_client_find_leader.

typedef struct dqlite_transport An abstraction whose implementation knows how to connect, read and write to specific network transports (TCP, TLS, etc), honoring user specified timeouts.

NOTE: internally the dqlite_transport struct should look something like:

struct dqlite_transport
{
   unsigned timeout;    /* Timeout for connect/read/write */
   void *data;               /* Implementation-specific details, e.g. for TLS */
   int (*connect)(struct *dqlite_transport, const char *address, int *fd),
   ssize_t (*read)(struct *dqlite_transport, int fd, void *buf, size_t count);
   ssize_t (*write)(struct *dqlite_transport, int fd, const void *buf, size_t count);
   int (*close)(struct *dqlite_transport, int fd);
}

1.2 Prepared Statement Object

SQLite

typedef struct sqlite3_stmt sqlite3_stmt; Created using sqlite3_prepare_v2(), used in sqlite3_bind_* to bind values to parameters and subsequently used by sqlite3_step, sqlite3_reset & sqlite3_finalize.

dqlite

typedef struct dqlite_stmt dqlite_stmt; Created using dqlite_prepare(), used in dqlite_bind_* to bind values to parameters and subsequently used by dqlite_step, dqlite_reset & dqlite_finalize.

2. Functions

2.1 Open

SQLite

int sqlite3_open(
  const char *filename,   /* Database filename (UTF-8) */
  sqlite3 **ppDb          /* OUT: SQLite db handle */
);

int sqlite3_open16(
  const void *filename,   /* Database filename (UTF-16) */
  sqlite3 **ppDb          /* OUT: SQLite db handle */
);

int sqlite3_open_v2(
  const char *filename,   /* Database filename (UTF-8) */
  sqlite3 **ppDb,         /* OUT: SQLite db handle */
  int flags,              /* Flags */
  const char *zVfs        /* Name of VFS module to use */
);

dqlite

/* Creates a new TCP transport, we could add a similar one for TLS. */
int dqlite_transport_create_tcp(
  unsigned network_timeout,    /* Timeout for network calls (connect/read/write), in ms. */
  dqlite_transport **transport /* OUT */
);

/* Extensible struct containing various dqlite settings
 * see https://www.linuxplumbersconf.org/event/7/contributions/657/attachments/639/1159/extensible_syscalls.pdf
 * for explanations around extensible structs.
 * This opts object should configure all there is to configure for the dqlite client & server, before starting it.
 * Some of the possible struct members are mentioned below.
 */
struct dqlite_opts {
    size_t size;                            /* The size of this struct */
    /* REMARK: this is the list of initial nodes to teach the dqlite object about.
     * the dqlite object itself should probably take care of refreshing this list periodically or perhaps
     * that could be done by the app-like higher-level helpers. */
    struct dqlite_node_info_ext *cluster;   /* NULL or pointer to array of `dqlite_node_info_ext` structs, a bootstrapping node will use this as initial configuration */
    unsigned cluster_n;                     /* Number of entries in the cluster array */
    dqlite_transport *transport; /* If NULL, a TCP transport with a default timeout will be created */
    ...
}

/* Connects to the leader and blocks until the database is ready to accept client request.
 * Returns DQLITE_OK on success.
 */
int dqlite_open(
        const char *path                   /* where dqlite will store it's data and db name to use */
	const struct dqlite_opts *opts,    /* dqlite options object containing settings, cluster info, ... */
	dqlite **ppDb,                     /* OUT: dqlite db handle */
);

/* Connects to the given node and creates a client object. */
int dqlite_client_create(
        const char *address,
        dqlite_transport *transport,
        dqlite_client **client
);

2.2 Close

SQLite

int sqlite3_close(sqlite3*);
int sqlite3_close_v2(sqlite3*);

dqlite

/* Closes the database and destroys a dqlite object obtained through a call to dqlite_open.
 * Returns DQLITE_OK on success.
 */
int dqlite_close(dqlite*);

/* Closes the client and destroys a dqlite_client object obtained through a call to dqlite_client_create.  */
void dqlite_client_destroy(dqlite_client*);

/* Destroys a TCP transport */
void dqlite_transport_destroy_tcp(dqlite_transport *transport);

Rationale

2.3 Prepare

SQLite

6 constructors

int sqlite3_prepare(
  sqlite3 *db,            /* Database handle */
  const char *zSql,       /* SQL statement, UTF-8 encoded */
  int nByte,              /* Maximum length of zSql in bytes. */
  sqlite3_stmt **ppStmt,  /* OUT: Statement handle */
  const char **pzTail     /* OUT: Pointer to unused portion of zSql */
);
int sqlite3_prepare_v2(
  sqlite3 *db,            /* Database handle */
  const char *zSql,       /* SQL statement, UTF-8 encoded */
  int nByte,              /* Maximum length of zSql in bytes. */
  sqlite3_stmt **ppStmt,  /* OUT: Statement handle */
  const char **pzTail     /* OUT: Pointer to unused portion of zSql */
);
int sqlite3_prepare_v3(
  sqlite3 *db,            /* Database handle */
  const char *zSql,       /* SQL statement, UTF-8 encoded */
  int nByte,              /* Maximum length of zSql in bytes. */
  unsigned int prepFlags, /* Zero or more SQLITE_PREPARE_ flags */
  sqlite3_stmt **ppStmt,  /* OUT: Statement handle */
  const char **pzTail     /* OUT: Pointer to unused portion of zSql */
);
int sqlite3_prepare16(
  sqlite3 *db,            /* Database handle */
  const void *zSql,       /* SQL statement, UTF-16 encoded */
  int nByte,              /* Maximum length of zSql in bytes. */
  sqlite3_stmt **ppStmt,  /* OUT: Statement handle */
  const void **pzTail     /* OUT: Pointer to unused portion of zSql */
);
int sqlite3_prepare16_v2(
  sqlite3 *db,            /* Database handle */
  const void *zSql,       /* SQL statement, UTF-16 encoded */
  int nByte,              /* Maximum length of zSql in bytes. */
  sqlite3_stmt **ppStmt,  /* OUT: Statement handle */
  const void **pzTail     /* OUT: Pointer to unused portion of zSql */
);
int sqlite3_prepare16_v3(
  sqlite3 *db,            /* Database handle */
  const void *zSql,       /* SQL statement, UTF-16 encoded */
  int nByte,              /* Maximum length of zSql in bytes. */
  unsigned int prepFlags, /* Zero or more SQLITE_PREPARE_ flags */
  sqlite3_stmt **ppStmt,  /* OUT: Statement handle */
  const void **pzTail     /* OUT: Pointer to unused portion of zSql */
);

dqlite

int dqlite_prepare(
  dqlite *db,               /* Database handle */
  const char *zSql,         /* SQL statement, UTF-8 encoded */
  int nByte,                /* Maximum length of zSql in bytes. */
  dqlite_stmt **ppStmt,     /* OUT: Statement handle */
  const char **pzTail       /* OUT: Pointer to unused portion of zSql */
);

Rationale

The SQLite documentation states that dqlite_prepare is legacy, we won't attempt to duplicate it. The prepare16 calls could be added in the future, but will be skipped in the first iteration of dqlite C api. An equivalent of sqlite3_prepare_v3 can be added in the future if there is a user need.

Comment

implementation detail: The dqlite_prepare response will - internally - contain a flag if the query is readonly using sqlite3_stmt_readonly(sqlite3_stmt *pStmt). This allows the client implementation to use the appropriate client message 5 & 6 of the wire protocol.

2.4 Finalize

SQLite

int sqlite3_finalize(sqlite3_stmt *pStmt);

dqlite

int dqlite_finalize(dqlite_stmt *pStmt);

2.5 Exec

SQLite

int sqlite3_exec(
  sqlite3*,                                  /* An open database */
  const char *sql,                           /* SQL to be evaluated */
  int (*callback)(void*,int,char**,char**),  /* Callback function */
  void *,                                    /* 1st argument to callback */
  char **errmsg                              /* Error msg written here */
);

dqlite

int dqlite_exec(
  dqlite*,                                   /* An open database */
  const char *sql,                           /* SQL to be evaluated */
  int (*callback)(void*,int,char**,char**),  /* Callback function */
  void *,                                    /* 1st argument to callback */
  char **errmsg                              /* Error msg written here */
);

Comment

Convenience wrapper around dqlite_prepare, dqlite_step & dqlite_finalize.

2.6 Bind

SQLite

int sqlite3_bind_blob(sqlite3_stmt*, int, const void*, int n, void(*)(void*));
int sqlite3_bind_blob64(sqlite3_stmt*, int, const void*, sqlite3_uint64,
                        void(*)(void*));
int sqlite3_bind_double(sqlite3_stmt*, int, double);
int sqlite3_bind_int(sqlite3_stmt*, int, int);
int sqlite3_bind_int64(sqlite3_stmt*, int, sqlite3_int64);
int sqlite3_bind_null(sqlite3_stmt*, int);
int sqlite3_bind_text(sqlite3_stmt*,int,const char*,int,void(*)(void*));
int sqlite3_bind_text16(sqlite3_stmt*, int, const void*, int, void(*)(void*));
int sqlite3_bind_text64(sqlite3_stmt*, int, const char*, sqlite3_uint64,
                         void(*)(void*), unsigned char encoding);
int sqlite3_bind_value(sqlite3_stmt*, int, const sqlite3_value*);
int sqlite3_bind_pointer(sqlite3_stmt*, int, void*, const char*,void(*)(void*));
int sqlite3_bind_zeroblob(sqlite3_stmt*, int, int n);
int sqlite3_bind_zeroblob64(sqlite3_stmt*, int, sqlite3_uint64);

dqlite

int dqlite_bind_blob(dqlite_stmt*, int, const void*, int n, void(*)(void*));
int dqlite_bind_double(dqlite_stmt*, int, double);
int dqlite_bind_int(dqlite_stmt*, int, int);
int dqlite_bind_int64(dqlite_stmt*, int, dqlite_int64);
int dqlite_bind_null(dqlite_stmt*, int);
int dqlite_bind_text(dqlite_stmt*,int,const char*,int,void(*)(void*));

Took the most obvious types here, can add the more exotic ones later on. These calls will construct a list of arguments to build a tuple from wire protocol to be evaluated in context of a prepared stmt.

2.7 Step

SQLite

int sqlite3_step(sqlite3_stmt*);

dqlite

int dqlite_step(dqlite_stmt*);

2.8 Column

SQLite

const void *sqlite3_column_blob(sqlite3_stmt*, int iCol);
double sqlite3_column_double(sqlite3_stmt*, int iCol);
int sqlite3_column_int(sqlite3_stmt*, int iCol);
sqlite3_int64 sqlite3_column_int64(sqlite3_stmt*, int iCol);
const unsigned char *sqlite3_column_text(sqlite3_stmt*, int iCol);
const void *sqlite3_column_text16(sqlite3_stmt*, int iCol);
sqlite3_value *sqlite3_column_value(sqlite3_stmt*, int iCol);
int sqlite3_column_bytes(sqlite3_stmt*, int iCol);
int sqlite3_column_bytes16(sqlite3_stmt*, int iCol);
int sqlite3_column_type(sqlite3_stmt*, int iCol);

dqlite

const void *dqlite_column_blob(dqlite_stmt*, int iCol);
double dqlite_column_double(dqlite_stmt*, int iCol);
int dqlite_column_int(dqlite_stmt*, int iCol);
dqlite_int64 dqlite_column_int64(dqlite_stmt*, int iCol);
const unsigned char *dqlite_column_text(dqlite_stmt*, int iCol);
int dqlite_column_bytes(dqlite_stmt*, int iCol);
int dqlite_column_type(dqlite_stmt*, int iCol);

2.9 Leader

SQLite

No equivalent

dqlite

/* It's possible that this node does not know who the leader is for
 * various reasons, othewise return id & address of the leader.
 * When `address` is NULL, the client will attempt to contact all cluster nodes
 * until it has identified the leader.
 */
int dqlite_leader(
    dqlite_client*,          /* An open client */
    dqlite_node_info_ext*,   /* OUT: leader info written here */
)

2.10 Cluster

SQLite

No equivalent

dqlite

/* Information about the cluster as perceived by the target node.
 * When `address` is NULL, the client will attempt to contact all cluster nodes
 * until it has identified the leader and ask the leader about the cluster information.
 */
int dqlite_cluster(
    dqlite_client*,           /* An open client */
    dqlite_node_info_ext[],   /* OUT: cluster info written here */
    dqlite_node_info_n*,      /* OUT: size of dqlite_node_info_array */
)

2.11 Add Node

SQLite

No equivalent

dqlite

int dqlite_node_add(
    dqlite_client*,           /* An open client */
    dqlite_node_info,         /* Information about the node to be added */
)

2.12 Remove Node

SQLite

No equivalent

dqlite

int dqlite_node_remove(
    dqlite_client*,          /* An open client */
    dqlite_node_id,          /* id of the node to be removed */
)

2.13 Assign Role

SQLite

No equivalent

dqlite

int dqlite_node_assign_role(
    dqlite_client*,          /* An open client */
    dqlite_node_id,          /* id of the node's role that needs to be changed */
    dqlite_role,             /* desired role of the node */
)

2.13 Describe

SQLite

No equivalent

dqlite

typedef struct dqlite_node_description dqlite_node_description;
/* Description currently contains address|weight|failuredomain*/
int dqlite_node_describe(
    dqlite_client*,               /* An open client */
    dqlite_node_description*,     /* OUT: description of the node */
)

2.14 Weight

SQLite

No equivalent

dqlite

int dqlite_node_weight(
    dqlite_client*,               /* An open client */
    node_weight ,                 /* desired weight of the node */
)

2.15 Failure Domain

SQLite

No equivalent

dqlite

int dqlite_node_failure_domain(
    dqlite_client*,               /* An open database */
    failure_domain,               /* desired failure domain of the node */
)

2.16 Dump

SQLite

No equivalent

dqlite

typedef struct dqlite_db_dump dqlite_db_dump;
/* Dumps the backing sqlite database in `path` or in the node directory
 * when `path` is NULL.
 */
int dqlite_dump(
    dqlite_client*,           /* An open client */
    const char* path,         /* NULL or Directory to write dump to */
    dqlite_db_dump*,          /* OUT: Contains info on the db dump */
)

2.16 Reconfigure

SQLite

No equivalent

dqlite

/* Reconfigures a cluster with the desired configuration */
int dqlite_reconfigure(
    path,                     /* A directory containing the dqlite db */
    dqlite_node_info_ext[]*,  /* Desired configuration of the cluster */
)

3 Example usage

3.1 exec

SQLite

#include <sqlite3.h>
#include <stdio.h>

int main(int argc, char* argv[])
{
    sqlite3 *db;
    char *err_msg = NULL;

    int rv = sqlite3_open("test.db", &db);
    if (rv != SQLITE_OK) {
        fprintf(stderr, "open database: %s\n", sqlite3_errmsg(db));
        sqlite3_close(db);
        return -1;
    }

    char *sql = "DROP TABLE IF EXISTS Ubuntu;"
                "CREATE TABLE Ubuntu(Id INT, Name TEXT);"
                "INSERT INTO Ubuntu VALUES(1, 'Warty Warthog');"
                "INSERT INTO Ubuntu VALUES(2, 'Hoary Hedgehog');"
                "INSERT INTO Ubuntu VALUES(3, 'Breezy Badger');"
                "INSERT INTO Ubuntu VALUES(4, 'Dapper Drake');";

    rv = sqlite3_exec(db, sql, 0, 0, &err_msg);
    if (rv != SQLITE_OK ) {
        fprintf(stderr, "SQLite exec failure: %s\n", err_msg);
        sqlite3_free(err_msg);
        sqlite3_close(db);
        return -1;
    }

    sqlite3_close(db);
    return 0;
}

dqlite

#include <dqlite.h>
#include <stdbool.h>
#include <stdio.h>

int main(int argc, char* argv[])
{
    int rv = -1;
    dqlite *db = NULL;
    const char *address = "127.0.0.1:9001";
    char *err_msg = NULL;

    struct dqlite_node_info_ext cluster = {
        .address = address
    };

    struct dqlite_opts opts = {
	.size = sizeof(opts),
        .cluster = &cluster,
        .n_cluster = 1,
	.transport = NULL,
    }

    rv = dqlite_open("/tmp/test.db", &db, &opts);
    if (rv != DQLITE_OK) {
        fprintf(stderr, "open database: %s\n", dqlite_errmsg(db));
        dqlite_close(db);
        return -1;
    }

    char *sql = "DROP TABLE IF EXISTS Ubuntu;"
                "CREATE TABLE Ubuntu(Id INT, Name TEXT);"
                "INSERT INTO Ubuntu VALUES(1, 'Warty Warthog');"
                "INSERT INTO Ubuntu VALUES(2, 'Hoary Hedgehog');"
                "INSERT INTO Ubuntu VALUES(3, 'Breezy Badger');"
                "INSERT INTO Ubuntu VALUES(4, 'Dapper Drake');";

    rv = dqlite_exec(db, sql, 0, 0, &err_msg);
    if (rv != DQLITE_OK ) {
        fprintf(stderr, "dqlite exec failure: %s\n", err_msg);
        free(err_msg);
        dqlite_close(db);
        return -1;
    }

    dqlite_close(db);
    return 0;
}

freeekanayaka avatar Oct 26 '21 12:10 freeekanayaka

I kind of like the simplicity of dqlite_open starting a node and allowing communication with the cluster. I do think a separate client that doesn't start a node is needed, but I maybe would make it an option of dqlite_open e.g. providing a NULL path parameter or setting some client_only flag in the options, this would then return a client-only dqlite object (like my previous proposal), this would more or less provide the same functionality as a simple client, but drops the extra struct. I think both approaches have their pros & cons though.

I like the transport abstraction idea, the current connect_func is a bit basic. I think I would probably start with the connect_func approach and later on add the dqlite_transport adding it to the opts.

MathieuBordere avatar Oct 26 '21 15:10 MathieuBordere

The problem I see with the simplicity of the "rich" version of dqlite_open is that eventually you'll likely want to grow that to match the functionality of the app package in the Go client (e.g. automatic cluster management, or things like app.Handover()) and mixing the two things (actually three if we consider also the client-only version) could make the option set and interface a bit confusing. Additionally, it might become a bit weird if you want for example to have more than one database connection in your application (or use multiple databases with different names but hosted by the same dqlite cluster): you'd have to create the first one passing node-related parameters to dqlite_open and the other ones without those parameters.

There's also a bit of an inherent mismatch between the db object and the client-only version, in that with the db object you typically want to hold an active connection to the leader (finding a new leader when the previous one gets deposed), while not so much with the client-only version which is a bit more node-specific and where you'd naturally want to hold a connection to a specific node, and not being explicit about it requires adding the extra address parameter to functions like dqlite_node_weight or dqlite_node_describe.

So the rich version of dqlite_open would have many roles: opening the first db and contextually starting the node, opening further db connections without starting the node, opening client-only objects. That means a more complex interface.

The way I view dqlite_open and possibly dqlite_client_create would be like the "plumbing" part of git, with each abstraction doing a specific thing with a straightforward interface (details like "you get a client-only object if path is NULL" tend to be a bit magical and less intuitive). The simplicity of the "rich" dqlite_open could be achieved by a "porcelain" layer (keeping the git analogy), e.g.:


typedef struct dqlite_app dqlite_app;
/* High-level helper for initializing a typical dqlite-based C application.
 * It takes care of starting a dqlite node, maintaining the cluster and providing client and
 * database connections. */

int dqlite_app_create(struct dqlite_app_opts *opts, dqlite_app **app); /* Creation */

int dqlite_app_open(dqlite_app *app, dqlite **db); /* New db connections, parameters not needed since details were given via struct dqlite_app_opts) */

int dqlite_app_client(dqlite_app *app, dqlite_client **client); /* Return a client connected to the local node */ 

int dqlite_app_leader(dqlite_app *app, dqlite_client **leader); /* Return a client connected to the current leader */ 

YMMV.

freeekanayaka avatar Oct 26 '21 20:10 freeekanayaka

Another slightly simpler variation of the the above interface would be to actually augment the current dqlite_node object with the automatic cluster-management functionality provided by the app package of the Go client, essentially making it equivalent to the dqlite_app object described above. That would also have the advantage that much of the logic in the app Go package could be dropped and future clients written for other languages would not need to re-invent that part (I believe this was my long term idea, but writing the automatic cluster-management functionality in Go instead of C was quicker).

For example:

int dqlite_node_db(dqlite_node *node, dqlite **db); /* New db connections, parameters not needed since details were given to dqlite_node */

int dqlite_node_client(dqlite_node *node, dqlite_client **client); /* Return a client connected to the local node */ 

int dqlite_node_leader(dqlite_node *node, dqlite_client **leader); /* Return a client connected to the current leader */ 

or something like that.

Note that the automatic cluster management functionality wouldn't be needed during the first pass of the C client implementation, Ibut believe it's important to plan for it in the longer term, and already know how it will fit in the design.

Again, just throwing ideas YMMV :)

freeekanayaka avatar Oct 26 '21 20:10 freeekanayaka

Thanks for the good input / ideas, working on a new proposition :-)

MathieuBordere avatar Oct 27 '21 13:10 MathieuBordere

Thanks!

One more observation: having dqlite_open also start the node, basically means that dqlite_opts will have to "duplicate" all the knobs that can be set on a node (network latency, bind address, failure domain, snapshot parameters, etc.) and every time some configuration is added to a node it must be added to dqlite_open as well.

FWIW the current design of the various dqlite_node_set_xxx functions was basically a way to have an extensible set of options (like the size parameter of dqlite_opts that allows the options set to grow in the future). The idea was that you'd 1) create a node object with dqlite_node_create, which gives you something with sane defaults 2) run whatever dqlite_node_set_xxx function you need (a bit like setting fields in dqlite_opts) 3) start the node. Using the functions has the slight advantage that if there are some options that need to be set together (like snapshot parameters), then that's more explicit. Note also that SQLite uses the same pattern: the sqlite3_open functions have very few parameters, and one can configure the sqlite object after it has been created.

My feeling is that creating the node first via dqlite_node_create() and then getting a db or client connection via dqlite_node_db() / dqlite_node_client() (or whatever function names we want to use) would be the simplest approach without renouncing to flexibility and future extensibility.

freeekanayaka avatar Oct 27 '21 14:10 freeekanayaka

So something like this?

/* Creates a dqlite_node with dqlite_node_create that manages the cluster through the dqlite_node 
 * (so eventually dropping the support for that in go-dqlite app)
 * opts contains settings to setup & connect to the cluster
 * Calling `dqlite_open` on a dir that is already running a cluster will allow you to run 
 * multiple databases on the same cluster if the filename is different.
 *  */
int dqlite_open(path, opts, *dqlite /* OUT */);

/* Destructor */
int dqlite_close(dqlite);

/* Allows you to fetch the node and set options on it through dqlite_node_set_xxx */
int dqlite_get_node(dqlite, *dqlite_node /* OUT */);

/* Blocks until a leader is elected and the cluster can accept queries */
int dqlite_start(dqlite);

/* Returns a dqlite client that will always try to connect to the leader */
int dqlite_get_leader(dqlite, *dqlite_client /* OUT */); 

/* Returns a dqlite_client that will connect to node with address. */
int dqlite_client_open(const char *address, 
   dqlite_transport *transport or connect_func,
   *dqlite_client /* OUT */);

/* Destructor */
int dqlite_client_close(dqlite_client);

All the other functions (that took dqlite* or dqlite_client*) would then take dqlite_client* objects, with the difference that the dqlite_client obtained through manual creation with dqlite_client_create has a higher chance of returning an ERR_NO_LEADER than the client obtained through dqlite_get_leader, which is a special client that always finds the leader node before sending the request.

MathieuBordere avatar Oct 27 '21 15:10 MathieuBordere

Comments:

  1. If the filename is not different what happens? Ideally we should be able to open multiple connections to the database, even with the same filename. I think the issue here is mixing the responsibilities of "create a node" and "open a database", there's a mismatch that can't be easily resolved by the same API.
  2. Also related to 1., the fact that there's a need to introduce a dqlite_get_node() function in order to set options seems a bit of a smell. In my mind the dqlite object is a database handle, and you need a node before creating a database handle. Why not making that explicit, setting up the node first and then getting as many database handles as you want?
  3. Having a dqlite_client object behave significantly differently depending on how it was created seems odd and potentially confusing. If you see a dqlite_client object at some line of code in your application it might not be obvious what it does. I'd recommend to have a dqlite_client behave in a basic intuitive way, like it always connects to a specific node and handle requests/responses against it. More complex behavior can be built upon it, either by the application or by dqlite itself (e.g. cluster management).

Any particular reason you don't like the design proposed in my previous comment?

freeekanayaka avatar Oct 27 '21 16:10 freeekanayaka

Any particular reason you don't like the design proposed in my previous comment?

int dqlite_node_db(dqlite_node *node, dqlite **db); /* New db connections, parameters not needed since details were given to dqlite_node */ int dqlite_node_client(dqlite_node *node, dqlite_client **client); /* Return a client connected to the local node */ int dqlite_node_leader(dqlite_node *node, dqlite_client **leader); /* Return a client connected to the current leader */

What I don't really understand is how you would use the dqlite db connection, would you use it in calls like dqlite_exec & dqlite_prepare ? And then use a dqlite_client in calls like dqlite_node_describe? Because I don't really see a reason why a dqlite_client couldn't be used in dqlite_exec or dqlite_prepare, but maybe I'm missing the point a bit?

MathieuBordere avatar Oct 27 '21 17:10 MathieuBordere

Yes, exactly, the dqlite db connection would be used for all the database APIs (basically everything that has a SQLite equivalent, so exec, prepare, step etc). The dqlite_client would be used mainly for the cluster management stuff (describe, add, weight, leader, cluster, etc). It's pretty much similar to the separation of the /driver and /client modules in the go-dqlite code.

The reason why it's a bit awkward to have a single object for the db connection and for the client is that the client is really bound to a specific node since it can also manipulate the state of that specific node (e.g. describe and weight) , while the db connection does not deal with the state of a specific node but just with the raft-replicated database which is node-agnostic and it should automatically reconnect to a different leader node.

Hope that makes sense.

freeekanayaka avatar Oct 27 '21 21:10 freeekanayaka

Ok it makes sense, but then we also need (imo) equivalents of exec, prepare, step, etc. that take a client no? Otherwise it will be difficult to make applications like the current dqlite shell that connect to a node and can query it without taking part in the cluster.

MathieuBordere avatar Oct 28 '21 09:10 MathieuBordere