core-uganda-pilot icon indicating copy to clipboard operation
core-uganda-pilot copied to clipboard

replication poc

Open ludydoo opened this issue 2 years ago • 6 comments

This PR is a proof of concept for the offline/reconciliation db engine that would power core

Features

  • Headless DB Engine (Implemented with SQLite)
  • Reconciler
  • Server
  • HTTP Client

The requirements of this engine are that

  • must work with clients that are offline for a very long time
  • should work on the client or server (peer to peer topology)
  • should offer a simple reconciliation mechanism
  • should be very robust and compatible with different runtimes and environments

Folders

  • api contains mostly public types and public methods that are meant to be exposed.
  • client contains the http client (for now). Could add grpc, websockets, etc.
  • engine contains the internal logic for the DB engine
  • handler contains the http handlers
  • test contains test utils
  • utils contains utils, such as a uuid generator

The interface is very simple and relatively self-explanatory

type ReadInterface interface {
	// GetRecord gets a single record from the database.
	GetRecord(ctx context.Context, request GetRecordRequest) (Record, error)
	// GetChanges gets a change stream for a table
	GetChanges(ctx context.Context, request GetChangesRequest) (Changes, error)
}

type WriteInterface interface {
	// PutRecord puts a single record inside the database.
	PutRecord(ctx context.Context, request PutRecordRequest) (Record, error)
	// CreateTable creates a new table in the database.
	CreateTable(ctx context.Context, table Table) (Table, error)
}

type Engine interface {
  ReadInterface
  WriteInterface
}

Records

The records have this structure

// Record represents a record in a database
type Record struct {
	ID               string     `json:"id"`
	Table            string     `json:"table"`
	Revision         Revision   `json:"revision"`
	PreviousRevision Revision   `json:"-"`
	Attributes       Attributes `json:"attributes"`
}

Record Serialization

Records have a special serialization mechanism, inspired from DynamoDB. It basically encodes any data type into a string to prevent any floating point approximation, which would undoubtedly break the hashing.

{
  "id" : "my-record-id",
  "table": "my-table",
  "revision": "1-96fc52d8fbf5d2adc6d139cb5b2ea099",
  "attributes": {
    "my-field-1" : { "string": "my-string-value" },
    "my-field-2" : { "int": "1234" }
  } 
}

Table structure

Each new table gives birth to two tables

  • <table>
  • <table>_history

Where <table> holds the reconciled version of the record and where <table>_history stores the different versions of the record.

ludydoo avatar May 13 '22 09:05 ludydoo

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ludydoo

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

/test pull-core-backend-test

ludydoo avatar May 13 '22 10:05 ludydoo

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 5 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

sonarqubecloud[bot] avatar May 18 '22 09:05 sonarqubecloud[bot]

My understanding of the roadmap is that we had decided to do a centralised/online only version of core. The majority of this POC seems to be focussed on the concepts of revisions, p2p, and reconciliation.

Our goal at the moment is to design our data model and based on this POC I'm not sure how to progress with that goal.

I guess I'm not too sure of your aim of this pr. Are you suggesting we should be building something along these lines for the architecture refactor or is this more to demonstrate your ideas for work in the further future?

I do appreciate the point of having a single entry point of an engine that can be put into any use case, but I think that is doesn't depend on the p2p/reconciliation. I also think the interface of the engine is a bit overloaded and could be rethought.

I'd also like to revisit the pros/cons of using a revision based approach vs a crdt system.

neb42 avatar May 25 '22 09:05 neb42

Something like this would separate the concerns of the engine a bit better

package engine

import (
	"context"

	"github.com/nrc-no/core/pkg/server/core-db/types"
)

type TableReader interface {
	Get(ctx context.Context, tableID string) (*types.Table, error)
	List(ctx context.Context) ([]types.Table, error)
}

type TableWriter interface {
	Upsert(ctx context.Context, table types.Table) (*types.Table, error)
	Delete(ctx context.Context, tableID string) error
}

type RecordReader interface {
	Get(ctx context.Context, tableId string, recordID string) (*types.Record, error)
	List(ctx context.Context) ([]types.Record, error)
}

type RecordWriter interface {
	Upsert(ctx context.Context, record types.Record) (*types.Record, error)
	Delete(ctx context.Context, tableID string, recordID string) error
}

type ReaderEngine struct {
	Table  TableReader
	Record RecordReader
}

type WriterEngine struct {
	Table  TableWriter
	Record RecordWriter
}

type Engine struct {
	Reader ReaderEngine
	Writer WriterEngine
}

func main() {
	tableReader = NewPostgresTableReader()
	tableWriter = NewPostgresTableWriter()
	recordReader = NewPostgresRecordReader()
	recordWriter = NewPostgresRecordWriter()

	engine := Engine{
		Reader: ReaderEngine{
			Table:  tableReader,
			Record: recordReader,
		},
		Writer: WriterEngine{
			Table:  tableWriter,
			Record: recordWriter,
		},
	}

	engine.Reader.Table.Create(...)
}

neb42 avatar May 25 '22 10:05 neb42

I was thinking about this a bit more and have realised that you're showing how we could do this without actually having a data model stored in the db.

The issues that come to mind with this approach:

  • Strong lock-in with sql. Even when looking at sql flavours there could be a feature disparity (sqlite doesn't have everything postgres has)
  • Difficult to query table metadata. We would need to parse a DESCRIBE query or use the metadata stored in sql and parse it into our data structures.
  • How would we do access control here? We could use schemas in postgres and different database files in sqlite, but this limits us to a single project level of access.
  • We lose custom metadata about tables, like who created it.

Seen as we already need to define a data model for the api, then storing this isn't that much extra and gives us some benefits.

neb42 avatar Jun 13 '22 10:06 neb42