core-uganda-pilot
core-uganda-pilot copied to clipboard
replication poc
This PR is a proof of concept for the offline/reconciliation db engine that would power core
Features
- Headless DB Engine (Implemented with SQLite)
- Reconciler
- Server
- HTTP Client
The requirements of this engine are that
- must work with clients that are offline for a very long time
- should work on the client or server (peer to peer topology)
- should offer a simple reconciliation mechanism
- should be very robust and compatible with different runtimes and environments
Folders
-
api
contains mostly public types and public methods that are meant to be exposed. -
client
contains the http client (for now). Could add grpc, websockets, etc. -
engine
contains the internal logic for the DB engine -
handler
contains the http handlers -
test
contains test utils -
utils
contains utils, such as auuid
generator
The interface is very simple and relatively self-explanatory
type ReadInterface interface {
// GetRecord gets a single record from the database.
GetRecord(ctx context.Context, request GetRecordRequest) (Record, error)
// GetChanges gets a change stream for a table
GetChanges(ctx context.Context, request GetChangesRequest) (Changes, error)
}
type WriteInterface interface {
// PutRecord puts a single record inside the database.
PutRecord(ctx context.Context, request PutRecordRequest) (Record, error)
// CreateTable creates a new table in the database.
CreateTable(ctx context.Context, table Table) (Table, error)
}
type Engine interface {
ReadInterface
WriteInterface
}
Records
The records have this structure
// Record represents a record in a database
type Record struct {
ID string `json:"id"`
Table string `json:"table"`
Revision Revision `json:"revision"`
PreviousRevision Revision `json:"-"`
Attributes Attributes `json:"attributes"`
}
Record Serialization
Records have a special serialization mechanism, inspired from DynamoDB. It basically encodes any data type into a string to prevent any floating point approximation, which would undoubtedly break the hashing.
{
"id" : "my-record-id",
"table": "my-table",
"revision": "1-96fc52d8fbf5d2adc6d139cb5b2ea099",
"attributes": {
"my-field-1" : { "string": "my-string-value" },
"my-field-2" : { "int": "1234" }
}
}
Table structure
Each new table gives birth to two tables
-
<table>
-
<table>_history
Where <table>
holds the reconciled version of the record and where <table>_history
stores the different versions of the record.
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: ludydoo
The full list of commands accepted by this bot can be found here.
The pull request process is described here
- ~~OWNERS~~ [ludydoo]
Approvers can indicate their approval by writing /approve
in a comment
Approvers can cancel approval by writing /approve cancel
in a comment
/test pull-core-backend-test
Kudos, SonarCloud Quality Gate passed!
0 Bugs
0 Vulnerabilities
0 Security Hotspots
5 Code Smells
No Coverage information
0.0% Duplication
My understanding of the roadmap is that we had decided to do a centralised/online only version of core. The majority of this POC seems to be focussed on the concepts of revisions, p2p, and reconciliation.
Our goal at the moment is to design our data model and based on this POC I'm not sure how to progress with that goal.
I guess I'm not too sure of your aim of this pr. Are you suggesting we should be building something along these lines for the architecture refactor or is this more to demonstrate your ideas for work in the further future?
I do appreciate the point of having a single entry point of an engine
that can be put into any use case, but I think that is doesn't depend on the p2p/reconciliation. I also think the interface of the engine is a bit overloaded and could be rethought.
I'd also like to revisit the pros/cons of using a revision based approach vs a crdt system.
Something like this would separate the concerns of the engine a bit better
package engine
import (
"context"
"github.com/nrc-no/core/pkg/server/core-db/types"
)
type TableReader interface {
Get(ctx context.Context, tableID string) (*types.Table, error)
List(ctx context.Context) ([]types.Table, error)
}
type TableWriter interface {
Upsert(ctx context.Context, table types.Table) (*types.Table, error)
Delete(ctx context.Context, tableID string) error
}
type RecordReader interface {
Get(ctx context.Context, tableId string, recordID string) (*types.Record, error)
List(ctx context.Context) ([]types.Record, error)
}
type RecordWriter interface {
Upsert(ctx context.Context, record types.Record) (*types.Record, error)
Delete(ctx context.Context, tableID string, recordID string) error
}
type ReaderEngine struct {
Table TableReader
Record RecordReader
}
type WriterEngine struct {
Table TableWriter
Record RecordWriter
}
type Engine struct {
Reader ReaderEngine
Writer WriterEngine
}
func main() {
tableReader = NewPostgresTableReader()
tableWriter = NewPostgresTableWriter()
recordReader = NewPostgresRecordReader()
recordWriter = NewPostgresRecordWriter()
engine := Engine{
Reader: ReaderEngine{
Table: tableReader,
Record: recordReader,
},
Writer: WriterEngine{
Table: tableWriter,
Record: recordWriter,
},
}
engine.Reader.Table.Create(...)
}
I was thinking about this a bit more and have realised that you're showing how we could do this without actually having a data model stored in the db.
The issues that come to mind with this approach:
- Strong lock-in with sql. Even when looking at sql flavours there could be a feature disparity (sqlite doesn't have everything postgres has)
- Difficult to query table metadata. We would need to parse a
DESCRIBE
query or use the metadata stored in sql and parse it into our data structures. - How would we do access control here? We could use schemas in postgres and different database files in sqlite, but this limits us to a single project level of access.
- We lose custom metadata about tables, like who created it.
Seen as we already need to define a data model for the api, then storing this isn't that much extra and gives us some benefits.