obi-sync-lib icon indicating copy to clipboard operation
obi-sync-lib copied to clipboard

Library Design

Open acheong08 opened this issue 1 year ago • 42 comments

I need some advice on how the websocket connection should work in JS and how data should be returned from the library. The current implementation is an experiment and should not be taken seriously.

The design in my head: Connect(vaultID: string, keyHash: string, token: string) - Returns some sort of channel (Or whatever the equivalent is in JavaScript). Spawn a thread in the plugin to listen to push operations from the server.

Pull(uid: string) - Sends a pull request through the websocket. Returns the binary and metadata

Push(...) etc

At least this is how I would do this in Golang. However, in JS, websockets are implemented with an onmessage callback. I do not know of a way to receive objects in order and know what type of message is being received.

CC @oskardotglobal

  • https://github.com/acheong08/obi-sync/issues/29

acheong08 avatar Sep 12 '23 13:09 acheong08

There is of course the option to rewrite the API of the server which is trivial as the current implementation is very modular. We don't even have to break compatibility with the official API - Just extending it to make our own use easier while exposing the old endpoints.

(I would prefer to have the websocket only be used for pushes from the server and have everything else be handled through a REST API. - 1 way communication. I understand there will be some overhead with HTTP/HTTPS but I don't think that matters too much)

acheong08 avatar Sep 12 '23 13:09 acheong08

[!Note] I have deleted the contents of this repository as it wasn't well thought out. This will take a few weeks (max 1 month) to get functional as I am busy relocating to a new country.

acheong08 avatar Sep 12 '23 13:09 acheong08

I'd say websockets are the best bet here.

However, in JavaScript a websocket does either text or binary data, mixing them is a bit of a pain. In this case I'd go with encoding all binary data as Data URLs (which isn't ideal, but only starts becoming a problem once we have files larger than ~256MB), so then we can send metadata and the file itself via one message. We could also have a type field which then determines the type when deserializing JSON. I don't know how you would order the files though, but I don't see how that's necessary either. If that's a problem, we could send a timestamp along with the metadata though

oskardotglobal avatar Sep 12 '23 14:09 oskardotglobal

We also have to keep speed in mind I am not sure on how to best incorporate this; I'll be digging through the current sync server to get a better feel for the architecture though

oskardotglobal avatar Sep 12 '23 14:09 oskardotglobal

My confusion revolves around how to pull and push data. Here is the code for pulling a file:

case "pull":
	var pull struct {
		UID any `json:"uid" binding:"required"`
	}
	err = json.Unmarshal(msg, &pull)
	if err != nil {
		ws.WriteJSON(gin.H{"error": err.Error()})
		return
	}
	var uid int = utilities.ToInt(pull.UID)
	file, err := vaultfiles.GetFile(uid)
	if err != nil {
		ws.WriteJSON(gin.H{"error": err.Error()})
		return
	}
	var pieces int8 = 0
	if file.Size != 0 {
		pieces = 1
	}
	ws.WriteJSON(gin.H{
		"hash": file.Hash, "size": file.Size, "pieces": pieces,
	})
	if file.Size != 0 {
		ws.WriteMessage(websocket.BinaryMessage, file.Data)
	}

When you send a pull request, it first sends the metadata: "hash": file.Hash, "size": file.Size, "pieces": pieces,. You must then check how many pieces there are to decide whether the next piece of data is binary. I can't wrap my head around how to do that when all incoming messages go through the onmessage callback. The metadata and binary data do not contain a tag to identify them as such. We have to somehow know what to do with the incoming data without knowing what triggered it.

One solution is to have a persistent variable that keeps track of what should come next but because it's async, it would be problematic if you receive a push signal from the server while expecting a pulled binary.

Another issue is how to return data from functions when the onmessage callback can only be set once. For example, calling Pull(UID: string) will send a pull request to the server but the function that receives the response is in whatever initiated the onmessage.

There is probably a smarter way to do this but I can't think of it

acheong08 avatar Sep 13 '23 05:09 acheong08

I feel like a more advanced websocket library like https://feathersjs.com/ or https://socket.io/ would be of good use here, which allow emitting multiple "events" on one socket which then can be handled separately, although we would have to check how that's compatible with gin & the ws library you use (I didn't try feathersjs, but I did try socketio and it's pretty good).

The problem with vanilla websockets is that it's unnecessarily hard to check if the received data is binary or json (text), so it is of no use here. By the way, if we do use socketio we may be able to mount the socket variable outside of the connection's callback scope like:

let socket; // we could also then use a maybe type or something for this
io.on("connection", (s) => socket = s);

oskardotglobal avatar Sep 13 '23 08:09 oskardotglobal

After trying out a few libraries, I still went with vanilla websockets.

My idea goes something like this: Connect handles the initial connection. onpush(callback: Function) can be used for when the server sends sync events pull(uid:int) sends the pull request and waits for an event to be emitted (happens when either pull response is received with 0 pieces or binary data is received.) I'm still figuring out EventEmitter

acheong08 avatar Sep 14 '23 04:09 acheong08

https://github.com/sindresorhus/p-event works well with emitters and allow me to simply wait for data to be emitted from onmessage callback in other functions

acheong08 avatar Sep 14 '23 04:09 acheong08

@oskardotglobal TypeScript and Node is driving me insane.

Could you review my https://github.com/acheong08/obi-sync-lib/blob/main/tsconfig.json and https://github.com/acheong08/obi-sync-lib/blob/main/package.json?

Running tsc gives me broken imports in JS

Error [ERR_UNSUPPORTED_DIR_IMPORT]: Directory import '/var/home/acheong/Projects/obi-sync-lib/lib/src' is not supported resolving ES modules imported from /var/home/acheong/Projects/obi-sync-lib/lib/tests/vault_test.js

I can fix it manually in the JavaScript but is there a way to make sure it imports the right files on build?

acheong08 avatar Sep 15 '23 04:09 acheong08

For example:

import { MakeKeyHash } from "./crypt";

is supposed to be

import { MakeKeyHash } from "./crypt.js";

acheong08 avatar Sep 15 '23 04:09 acheong08

I ended up using esbuild to bundle everything together. Not sure if that is optimal but at least it works

acheong08 avatar Sep 15 '23 05:09 acheong08

You should really try https://bun.sh. It's a runtime, package manager and bundler fully compatible with node. I'll review the whole code later today and might open a PR or two.

I don't know if the Eslint solution is ideal, I'll look into it.

oskardotglobal avatar Sep 15 '23 07:09 oskardotglobal

You should really try https://bun.sh/. It's a runtime, package manager and bundler fully compatible with node.

Just tried out Bun from their quickstart page. It worked on first try!

acheong08 avatar Sep 15 '23 09:09 acheong08

Looks like there are still a few incompatibilities: e.g. crypto.subtle is missing when running bun build but works when running directly from the typescript

acheong08 avatar Sep 15 '23 09:09 acheong08

  • https://github.com/oven-sh/bun/issues/5458

acheong08 avatar Sep 15 '23 10:09 acheong08

The crypto module isn't fully ported (see https://bun.sh/docs/runtime/nodejs-apis) which is expected since bun hit 1.0.0 like a week ago. Related to your issue, did you try importing from node:crypto instead of crypto?

oskardotglobal avatar Sep 15 '23 10:09 oskardotglobal

Related to your issue, did you try importing from node:crypto instead of crypto?

yes. didn't work.

🟡 Missing crypto.Certificate crypto.ECDH crypto.KeyObject crypto.X509Certificate crypto.checkPrime{Sync} crypto.createPrivateKey crypto.createPublicKey crypto.createSecretKey crypto.diffieHellman crypto.generateKey{Sync} crypto.generateKeyPair{Sync} crypto.generatePrime{Sync} crypto.getCipherInfo crypto.{get|set}Fips crypto.hkdf crypto.hkdfSync crypto.secureHeapUsed crypto.setEngine crypto.sign crypto.verify. Some methods are not optimized yet.

acheong08 avatar Sep 15 '23 11:09 acheong08

esbuild works for now. Won't be looking into bun until it's stable.

I'll keep working on the websocket stuff over the next week

acheong08 avatar Sep 15 '23 11:09 acheong08

@oskardotglobal Basic pull/push functionality is there. Mind taking a look at the code? Not sure if it's the right way of doing things.

To do:

  • End to end encryption
  • History
  • Delete
  • Restore

acheong08 avatar Sep 23 '23 12:09 acheong08

Aside from the nonexistent naming convention (which I can fix if you want me to) it looks great logic-wise. I have to dive deeper into the actual sync server though, I still don't fully understand how that works and it doesn't help there that my Go is a bit rusty Once I get the gist of it, I might be able to contribute some of those features

oskardotglobal avatar Sep 23 '23 15:09 oskardotglobal

Aside from the nonexistent naming convention

haha I just realized that ~ I'll fix it

Edit: Actually, I got no clue what the naming conventions are for TypeScript. I think I unconsciously went between Python and Golang conventions depending on what language I was just using before writing the code

acheong08 avatar Sep 23 '23 21:09 acheong08

it looks great logic-wise

If so, I'll start working on the encryption and then a draft plugin. The other features can wait.

acheong08 avatar Sep 23 '23 21:09 acheong08

@acheong08 I was wondering if you have any plans for development.

baek-sang avatar Dec 10 '23 18:12 baek-sang

I've lately been using Neovim + Syncthing to take my notes as I currently don't have the bandwidth to maintain anything (currently in university).

acheong08 avatar Dec 10 '23 19:12 acheong08

One thing you can do is unpack the Obsidian app's asar file and remove the bits of code that prevent the current plugin from being used, and pack it back up. I can post instructions for that if anyone is interested.

acheong08 avatar Dec 10 '23 19:12 acheong08

I got kind of frustrated with the fact obsidian isn't open source, so ended up switching to an actual open source note taking server that tries to do things similar to obsidian called Trillium.

The plugin scene is nowhere near as developed, but the sync story is simple at least, since you just self-host it on a server and you're done. Downside of this approach is no offline support (unless you runs a copy locally and sync them).

benkaiser avatar Dec 10 '23 20:12 benkaiser

I tried out and strongly dislike Trilium since getting out your data is hard since it isn't actual markdown, the formatting differs and since like you mentioned it required an internet connection 100% of the time

oskardotglobal avatar Dec 10 '23 22:12 oskardotglobal

One thing you can do is unpack the Obsidian app's asar file and remove the bits of code that prevent the current plugin from being used, and pack it back up. I can post instructions for that if anyone is interested.

Could you please post instructions about it? Thanks.

xcnick avatar Dec 14 '23 03:12 xcnick

I would recommend looking at this if you have time. Logseq could be a better option.

https://github.com/bcspragu/logseq-sync

azoller1 avatar Dec 23 '23 04:12 azoller1

Well this library is intended to not be bound to obsidian but to be usable with any kind of client plugin, so in logseq too

oskardotglobal avatar Dec 23 '23 09:12 oskardotglobal