Nano-SQL Offline use with Sync to Server

Offline use with Sync to Server

Open nevf opened this issue 6 years ago • 71 comments

I just found Nano-SQL, congratulations on your work, it looks interesting. Oh and good Docs.

I am especially interested in a Database which works both in the Browser and on the Server with synchronization between them. The Browser app needs to be able to work offline and sync to the server when back online. ie. Eventual consistency.

PouchDB/CouchDB does this, however I don't want to use these.

This is for my Web Knowledge Base app Clibu which currently uses MongoDB on the server and Dexie.js, in a limited way in the Browser. It doesn't have the sync/offline capability I am referring to.

Clibu can also be used On Premise, which requires the user to install MongoDB. An embedded DB would be nice in this scenario.

Jan 23 '18 22:01 nevf

Thanks, glad you're getting some use out of it!

I think this is a perfect use case for the plugin system.

Reading some of the documentation on PouchDB it's clear to me that CouchDB/PouchDB were specifically designed to handle the problem of syncing between databases. If we chose a subset of those features it should be doable.

Mostly the conflict resolution stuff gets really complicated. Wed'd need to just go with last write wins for conflicts to keep thing simple.

I'm thinking the plugin config/setup might look something like this:

// client side setup
nSQL().use(new nanoSQLSyncPlugin({
    serverURL: "https://mywebsite.com",
    // which tables to sync
    tables: ["users", "posts"],
    // how often to poll the server to check for connection
    pollEvery: 5000,
    // get data needed to authenticate this user (optional)
    getAuthData: (onComplete) => {
        // whatever you give onComplete() will get passed as authData to server
        // called before every request to server
    },
    // connected to remote server
    onSyncConnected: () => {

    },
    // no longer connected to remote server
    onSyncDisconnected: () => {

    },
    // syncing with remote started
    onSyncStart: () => {

    },
    // remote and local are in sync
    onSyncEnd: () => {

    }
}));
// stop syncing/polling at any time
nSQL().extend("disconnectSync");
// start syncing/polling again
nSQL().extend("connectSync");

// server side setup
nSQL().use(new nanoSQLSyncPluginServer({
    expressApp: app,
     // which tables to sync
    tables: ["users", "posts"],
    authRequest: (updateOrDelete, table, primaryKey, rowData, authData, onComplete) => {
        // pass true to onComplete to allow the update/read
        // pass false to onComplete prevent the request.
    }
}));

Let me know what you think!

Jan 24 '18 03:01 only-cliches

@ClickSimply Scott, thanks for the prompt reply. Unfortunately implementing offline db use and eventual consistency is a fairly complex beast. Multiple users can have updated the server while you are offline, so the server needs to keep track of this. The clients need a way of finding out what has changed since they were last online and if there updates are newer, push them to the server. Deletes also need to be handled. Couch uses sequence numbers for part of this.

Some articles etc. which will help here are: http://docs.couchdb.org/en/2.1.1/replication/protocol.html https://github.com/couchbase/couchbase-lite-ios/wiki/Replication-Algorithm (maybe out of date) http://offlinefirst.org/sync/ https://www.npmjs.com/package/dexie-syncable (possibly what I'll end up using) https://github.com/share/sharedb https://github.com/paldepind/synceddb https://kinto.readthedocs.io/en/stable/ https://github.com/forbesmyester/SyncIt

Browser/Server communication should be abstracted so either http or websockets can be used. In Clibu I use Websockets.

FYI I am not actually using Nano-SQL, just an interested observer.

Jan 24 '18 21:01 nevf

Thanks for the info! This might be something I tackle in the future but there's a few other things I'd like to see in place before this.

Jan 25 '18 21:01 only-cliches

@ClickSimply No problem and understood. Keep up the good work.

PS. It would be nice to see a NoSQL query api like MongoDB vs. SQL style. ;-)

Jan 26 '18 21:01 nevf

There is a good CRDT imlemenation that allows 100% synchronisation. https://github.com/y-js/yjs Its used in production and very easy to use.

I really think you should look at this, because its a leap frog technology. The way CouchDB and others work is to use the Last Write Wins which does not guarantee that all changes on a type ( or datbase row as it were) that are from many offline users does resolve without anyones data being overwritten.

The interesting thing about y.js is that all the reconciliation happens clientside. This measn that server side you can either hold the "last know version of a type" or hold all versions known to be out there. You can do either depending on the use case.

Anyway CRDT is pretty hard stuff and y.js is quite an achievement.. Have a look and see what you think.

BTW Nano-SQL looks awesome. Thank you for putting this out in open source !

Mar 01 '18 11:03 ghost

There is also:

https://github.com/automerge/automerge
https://github.com/mafintosh/hypercore

Mar 15 '18 16:03 atifsyedali

@atifsyedali Yes automerge is very good too i agree. Its CRDT based. Very good people behind it. The hard part with CRDT is the garbage collection. Its deletes stay with the document for example.

Mar 15 '18 22:03 ghost

@gedw99 ys-js is interesting and works well, however I have several concerns. The changes stored in IndexedDb etc. appear to grow forever. I posted on Gitter back on Mar 13 and have not had a response. This level of support which just seems to be a single developer is another concern.

@atifsyedali Automerge is impressive but really only suitable for in memory objects. So for example if you want to merge database doc's for offline use it isn't suitable. Also I think it's memory use keeps growing with every change. ie. No garbage collection as @gedw99 mentioned.

hypercore which is part of DAT seems only suitable for synchronizing files, not JS Objects or Database documents.

Apr 01 '18 13:04 nevf

There's also an actor based model of sync and state. I can't really write much more about it right now (gotta run) but hopefully I'll remember to later.

Here's a project that implements this system http://ceptr.org/projects/holochain#local-source-chain

Apr 07 '18 18:04 plentylife

@nevf Regarding y.js. Its a shame you did not get a response. I used it and when i hit issues i emailed the developer directly and got feedback in a day or two normally.

Its only 2 devs, but its actually a whole team and its used for lots of 3D systems by the team at a big university in Germany - cant remember the one. Its been going for quite a long time. Its true thats its risky to use something that does not have a huge company behind it. But, also its been chugging along for many years and has so far had very consistent pace to it.

Also there are really hardly any CRDT offline / online systems out there. I guess its because the maths and concepts are hard.

The team behind autoMarge is the Cambridge team. The main guy is famous for his "Turning databases inside out" video. https://www.confluent.io/blog/turning-the-database-inside-out-with-apache-samza/ Martin Kleppman. He was the lead architect behind linked in and kind of pioneered Samza and also CQRS approaches.

Apr 08 '18 15:04 ghost

So far I like Automerge the best. It seems to be designed well to handle integrating into other projects and performs conflict resolution automatically in every state that it can. The well documented feature of recording conflicts is also really nice, it'll let us create a secondary table that holds the conflicts so they can be managed manually at a later date.

Apr 09 '18 18:04 only-cliches

@ClickSimply Sounds good, and i think they will support it ongoing too.

Makes sense to still choose it even though it has no Durable data storage. Easy to use your own.

Apr 09 '18 21:04 ghost

Remote sync is the missing piece of the puzzle.

I have been looking for and evaluating multiple offline-first database alternatives that support web browsers for a while and I just found out about about Nano-SQL a few hours ago. I went over the documentation and I like Nano-SQL a lot so far.

If remote sync is implemented I think it could make Nano-SQL an excellent choice for progressive web apps (PWAs).

These are some of the alternatives I have evaluated and in my opinion their pros and cons: Blockers have been bolded

Gun.js

Pros:

Very good performance
First class offline support, does a great job at syncing data with remote sources
Local and remote data can be encrypted

Cons:

Limited query engine
No support for relationships, join queries nor aggregate queries

Dexie.js

Pros:

Great performance
Flexible query engine
Can sync with remote sources although only a remote sync/persistency adapter is available and it uses NeDB for persistency which is not actively maintained

Cons:

Limited support for relationships and no support of join nor agregate queries

Lovefield

Pros:

Amazing performance
Excellent query engine, supports relationships, join and aggregate queries
Can be used with local sources (IndexedDB) or remote (Firebase) but no other remote adapters are available

Cons:

No remote sync, can use either a local or a remote adapter
Even though it's used at Gmail I was recently told by its main maintainer that the project is in maintenance mode and that it is very unlikely that Google will be adding new features (maybe so it doesn't compete with Firebase?)

I haven't tested Nano-SQL yet but so far based on what I read from the docs my impressions are:

Pros:

Can achieve good performance
Great query engine with support for relationships, join queries and offers ways to handle aggregate queries and functions
Allows listening to data changes events
Multiple adapters for browser and server-side persistence

Cons:

No remote sync but looks like there is interest and could be implemented

Apr 11 '18 06:04 sebastianmacias

automerge seems to be the one that was discussed previously..

https://github.com/automerge/automerge

On Wed, 11 Apr 2018 at 08:27 Sebastian Macias [email protected] wrote:

Remote sync is the missing piece of the puzzle.

I have been looking for and evaluating multiple offline-first database alternatives that support web browsers for a while and I just found out about about Nano-SQL a few hours ago. I went over the documentation and I like Nano-SQL a lot so far.

If remote sync is implemented I think it could make Nano-SQL an excellent choice for progressive web apps (PWAs).

These are some of the alternatives I have evaluated and in my opinion their pros and cons: Blockers have been bolded

Gun.js https://github.com/amark/gun

Pros:

Very good performance

First class offline support, does a great job at syncing data with remote sources

Local and remote data can be encrypted

Cons:

Limited query engine

No support for relationships, join queries nor aggregate queries

Dexie.js https://github.com/dfahlander/Dexie.js

Pros:

Great performance

Flexible query engine

Can sync with remote sources http://dexie.org/docs/Syncable/Dexie.Syncable.js although only a remote https://github.com/nponiros/sync_server sync/persistency adapter is available and it uses NeDB https://github.com/louischatriot/nedb for persistency which is not actively maintained

Cons:

Limited support for relationships https://github.com/ignasbernotas/dexie-relationships and no support of join nor agregate queries

Lovefield https://github.com/google/lovefield

Pros:

Amazing performance https://google.github.io/lovefield/dashboard/src/dashboard.html

Excellent query engine, supports relationships, join and aggregate queries

Can be used with local sources (IndexedDB) or remote (Firebase) but no other remote adapters are available

Cons:

No remote sync, can use either a local or a remote adapter

Even though it's used at Gmail I was recently told by its main maintainer that the project is in maintenance mode and that it is very unlikely that Google will be adding new features (maybe so it doesn't compete with Firebase?)

I haven't tested Nano-SQL yet but so far based on what I read from the docs my impressions are:

Pros:

Can achieve good performance

Great query engine with support for relationships, join queries and offers ways to handle aggregate queries and functions

Allows listening to data changes events

Multiple adapters for browser and server-side persistence

Cons:

No remote sync but looks like there is interest and could be implemented

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ClickSimply/Nano-SQL/issues/18#issuecomment-380341729, or mute the thread https://github.com/notifications/unsubscribe-auth/ATuCwp2H8tu6pdSG4DA8zE0b4ics1Ljyks5tnaJZgaJpZM4RqaN- .

Apr 11 '18 09:04 ghost

@sebastianmacias Thanks for the list and pros/cons. Re. Dexie.Syncable the Sync Server just uses Nedb as a quick way to get a sample running. You would replace that with MongoDB or whatever backend DB you wanted so you con isn't relevant here.

@ClickSimply Automerge is indeed impressive however last I looked (and asked) it only works with in-memory objects and isn't durable as @gedw99 mentioned.

Like y-js I'm also concerned about lack of garbage collection with automerge which could end up consuming a lot of memory if the objects are a reasonable size and change frequently. For example a 5KB Markdown document which was edited 20 times (in a day) would use 100KB + metadata.

Apr 11 '18 21:04 nevf

Logux is another library I've been evaluating and not had time to mention before (been travelling) but shows promise. What I like about Logux, is it is completely independent of the front/backend database. Which Dexie.Syncable is largely.

Unfortunately development is a bit slow as Andrey has other commitments. I have reported a bug to Andrey who has been very, very helpful but can't progress further until that is addressed. Also the docs are confusing right now.

Video talk Logux Core Server Client Google Logux for more info.

Apr 11 '18 21:04 nevf

@ClickSimply Automerge is indeed impressive however last I looked (and asked) it only works with in-memory objects and isn't durable as @gedw99 mentioned.

This is actually preferable to the other libraries for exactly this reason. This will make Automerge very easy to integrate into nanoSQL. Rows can be loaded into Automerge when they need to be worked on, then pulled out when they need to be saved. NanoSQL needs to have control over every row's complete lifecycle or things will get crazy real fast.

Like y-js I'm also concerned about lack of garbage collection with automerge which could end up consuming a lot of memory if the objects are a reasonable size and change frequently. For example a 5KB Markdown document which was edited 20 times (in a day) would use 100KB + metadata.

Automerge (from what the docs say) is designed specifically for merging different documents into one without destroying data, conflicts should come up infrequently and can be placed into a separate table so they can be taken care of manually. For your markdown example the 5KB document shouldn't need to store full copies for each edit, only if there is a conflict Automerge can't resolve itself.

Since the conflicts are stored in nanoSQL there will only be a memory cost if you're not using persistence into IndexedDB/WebSQL.

Again lack of garbage collection is preferable in this case if you ask me. Multiple documents are merged into single CRDTs when possible (making garbage collection moot) and the conflicts are sent to a separate callback which should be handled manually, you don't want data to be accidentally destroyed.

Just my $0.02.

Apr 11 '18 22:04 only-cliches

@ClickSimply I look forward to seeing your progress with automerge.

After playing around with automerge for a bit today and reading the docs etc. it does keep a complete history of all changes, forever. Read [this related issue] (https://github.com/automerge/automerge/issues/51) . You can create a new document which excludes the history however I can't see a way of removing a document from automerge. Also as per the issue I posted @pvh makes the point that the history should never be deleted.

So unless I am missing something (which is quite possible) memory usage will grow and grow as per my earlier comment.

Apr 14 '18 08:04 nevf

Hey gents, I'm getting close to implementing this as a core feature.

Looking for feedback before I put this in stone.

Here's what I'm thinking:

Implement a conflict resolution feature nearly identical to CouchDB that works on the client and the server. I think it would also be interesting to have a callback for conflict resolution that passes in the table, query, and conflicting rows. You can return the resolved row in the callback or false for it to fallback to versioning like CouchDB.
Use websockets with ajax polling fallback to allow syncing between client side databases and servers alike. This means you could not only use the offline with sync to server but you could setup multiple servers all syncing between themselves for redundancy. (I've been trying to setup a no-single-point-of-failure architecture for my projects and this is a good place to do it in my opinion)
Include a simple JSON Web Tokens feature in the client/server model with security baked in. I think a big thing many existing offline sync databases are missing is any kind of security/authentication. I've been using JSON web tokens in my projects with good success and it makes sense to drop them in here. Keep in mind there won't necessarily be a password/login feature here. It would just be easy for you to set properties on the JSON web tokens and check their state on the client and server to validate requests. But it would be SUPER easy to build an account system on top of this.
Make three way data binding super simple using the new observer feature, we could do something like this:

nSQL().observer(() => {
    return nSQL("table").query("select").emit();
})
.bindToServer() // three way data binding ftw!
.subscribe((rows) => {});

I don't think we actually include any conflict resolution code, I really like CouchDBs approach here where if a record is conflicting at all everything gets saved as revisions of that row and a winner is selected in a deterministic way. This lets the application developers handle conflict resolution in a way that suits their use case, prevents the build from bloating into hundreds of kilobytes and prevents loss of data.

May 26 '18 00:05 only-cliches

@ClickSimply do you have a rough timeline for this? The way you are describing it is just what I have been looking for. I can help with testing if needed.

Jun 19 '18 01:06 sebastianmacias

Hello Sebastian, work projects are dominating my time right now, I'll likely be able to have a beta in place around the beginning of next month.

I'll drop a comment on here when I've got something folks can start playing with. 😃

Jun 21 '18 22:06 only-cliches

Excellent news, I'm particularly excited about this topic. I noticed the "Comparison With Other Projects" you added to the README. I believe nanoSQL truly has a lot of potential. I'm actually about to start using it in a project I have been working on.

Thanks and congratulations on your work so far.

Jun 22 '18 04:06 sebastianmacias

@ClickSimply Skimming over this issue I couldn't help but think of a way that wouldn't include conflict resolution, but could potentially solve the goal of having data synced between a client and a server as well as some authorization/authentication.

Gun.js Storage Adapter. The idea is that we would be storing data in Gun.js, which would then trigger the normal sync updates that it performs. This should also cover the case where we are offline because Gun works while offline and then attempts to sync when back online. The implementation would be fairly close to how LevelDBs adapter is done. The biggest unknown for me right now is how Gun.js would affect observable queries. And deleting data would have a little quirk: the requirement of a deleted_at field for a Model. And then all the operations of the adapter would take note of that field when performing operations. I only see as this being the case because you can never really delete nodes. Only set them to NULL or some other value that represents the record being deleted.

This is not fully thought out. But it's something that I want to explore and maybe implement if I find it feasible. Wanted to to toss this your way to get your thoughts on it.

Jun 29 '18 21:06 509dave16

Hey Dave, really good suggestions. I think having a Gun.js adapter would be a good solution for many uses, we might add that as an adapter along with Amazon DynamoDB in the near future.

One of the "big deal" features for me that I haven't really seen in CouchDB, Gun and many others is a flexible security model, they seem to almost exclusively be all or nothing. Let me give you a very simple example: a system with blog posts like Wordpress.

Everyone should have read capability, but only specific users should have write capability and you may even want to get more specific than that (some users can only change specific column values). So with offline syncing, any public user can grab a copy of the recent blog posts; then if they come back 6 months later with conflicts in their copy the conflicts simply get overwritten with the newest posts. But if an administrator comes back 6 months later with different blog posts those conflicts now have to be merged with the rest of the system. Or let's say you wanted some rows accessible only to specific users based on a permission level stored in the database.

These are the kinds of problems I'd like to solve with nanoSQL's offline/syncing system, and honestly where I've seen many of the existing solutions fall short.

Jul 03 '18 11:07 only-cliches

@ClickSimply Regarding Conflicts This is definitely an issue that Gun does not resolve. CouchDB does this though. Using a revision history for a document. In projects I have worked on revisions are kept for historical purposes. And could be used for manually merging or choosing revisions. So in any storage engine that Nano-SQL uses a revisions table for all changes(or one for each table's changes) could be managed to provide a history. However conflict determination would need to take place so that we can let the User know if a conflict occurred and let them take action if they are allowed to.

Jul 03 '18 20:07 509dave16

This is another isomorphic library that deals data sync, conflict resolution and offline-first features to keep in mind orbitjs.com. I just found out about it.

Jul 04 '18 03:07 sebastianmacias

FYI Notes I've made re. Orbit.js I can't see any way to specify/use database indexes. I've written a Gitter post on this. 21 Feb 18

It looks like all data is kept in memory which can be backed up to various stores, such as IndexedDB. However I can't see that it is possible to lose the in-memory store/cache and just use IndexedDB?

I can't find any documentation on how synchronization works. Latest wins, CRDT ...? Docs say it can be used to sync editor content?

Jul 04 '18 04:07 nevf

@509dave16 I have real trouble thinking of GunDB as a "real" database. To me it is more of a distributed in-memory cache. It has no query language, no indexes and the entire "DB" is in memory. Further it is unreliable - see this long outstanding issue

Jul 04 '18 04:07 nevf

@ClickSimply re. your May 26 post. Am I not mistaken that you haven't included keeping a log of all db actions in your 4 points. A log is needed when you are offline and is used to synchronize the client and server when back online.

Jul 04 '18 04:07 nevf

Hi, @ClickSimply, just wondering if we'll get to see this in the near future. Currently going with rxdb/coach as I can't find a good solution for offline-first client-server sync, but would be really stoked to see it happen and give it a test drive as soon as it's available!

Jul 31 '18 10:07 rafamel

Nano-SQL Nano-SQL copied to clipboard

Offline use with Sync to Server

Nano-SQL
Nano-SQL copied to clipboard