orbitdb icon indicating copy to clipboard operation
orbitdb copied to clipboard

The Road to 1.0

Open haadcode opened this issue 5 years ago • 47 comments

The Road to 1.0

by @haadcode and @aphelionz

Big things are coming to OrbitDB. In this document, we describe our proposal for getting OrbitDB from alpha to 1.0.

As always, our roadmap is a combination of the long-term vision for OrbitDB, features the users have been asking for, issues the community wants to address, the core developers have separately discussed, or have been encountered as bugs in using OrbitDB. Please note that inclusion in the road map is not a promise of delivery!

Things here are subject to alteration or deletion. As of this writing, these should be considered proposals and open conversations and anybody should feel welcome and able to provide feedback in the form of questions, comments, or suggestions.

As always, we welcome contributions from the community and would be happy to help to land any of the discussed features or fixes.

If you would like to financially support OrbitDB, we now have an OpenCollective that we request you contribute to. Anything helps and we are forever grateful for your support, monetary or otherwise.

In general, the feaqtures and improvements proposed revolve around three categories: Performance and Resource Consumption, User Experience, and Encryption. Without further ado, let's look at the specific items.

Checklist

  • Non-breaking changes
    • [x] Replicator Refactoring
    • [ ] BTree Indexing for KVStore and DocStore
    • [ ] Snapshots
    • [ ] User Experience Improvements
    • [ ] Developer Experience: The Publish Dance
    • [ ] Community efforts
  • Breaking Changes
    • [ ] Oplog Watermarks
    • [ ] Database Encryption
    • [ ] Streaming / Async Iterators
    • [ ] Hot / Cold Data Separation
    • [ ] Misc. Cleanup
  • [ ] Potential Rust integration

Non-breaking changes

The changes in this section should be able to be implemented without breaking any backwards compatibility or public-facing APIs. Though certain application-level details might change and need to be addressed, by and large these changes should not require a new major version.

Replicator Refactoring

Use Case: I have a database that has been replicated locally. I want to get the current state of the db as fast as possible when opening the db (in order to return the first query as fast as possible).

As of right now, the store replicator uses the next field in a log entry to replicate, whereas it could use the new refs field, as loading now does.

This is by and large the most effective improvement we can make, and perhaps the most often requested and discussed in the community.

There are other possible ways to address the initial query and loading performance that we may want to take up on.

Further discussion: https://github.com/orbitdb/SCPs/pull/3 Work is happening here: https://github.com/orbitdb/orbit-db-store/pull/100

BTree Indexing for KVStore and DocStore

As it stands, all keys from the database index are kept in memory. This works well for most cases, but becomes a limit once you get to the order of 1M keys or more. This can be minimized with the use of B-Trees.

@vasa-develop and @vaultec81 utilized this technique in AvionDB.

This is highly connected to "Hot/Cold Data Separation (in-memory vs. on-disk data)".

Snapshots

I have a database that has been replicated locally. I want to get the current state of the db as fast as possible when opening the db (in order to return the first query as fast as possible).

A snapshot is the current state of the database, ie. only the current data without the database oplog (history). The snapshot of the current state could potentially be a log db itself.

User Experience Improvements

A collection of "small" items that would improve UX for the OrbitDB user.

  • Add a "merge fields" option to DocStore.put to merge the fields of the current doc and updated doc
  • Remove the need for a database name and just use the CID as the address. Move everything else to the manifest (which already contains the name of the database).
  • Remove the need for separate load() (but keep it available) and provide a one-liner to start, eg. OrbitDB.open(<address>) performs the instantiation of the orbitdb object, opening of the database and what currently happens in load().

Developer Experience: The Publish Dance

One of the biggest hurdles to releases is a term the contributors call the "publish dance" which requires a coordinated effort of publishing around 20 different npm modules that together constitute an OrbitDB release. There's no need to enumerate them here but the process generally starts from ipfs-log and moves upward to the top-level orbit-db.

The community has discussed about solving this on tooling level, such as using Lerna for module management, but a better alternative would be to address this on the architecture / implementation level by:

  • remove the inheritance of stores and inject the Store module to stores
  • remove ipfs-log dependency from Store and inject it from OrbitDB
  • generally switch all inheritance to dependency injection (eg. feedstore takes in as a parameter an eventstore instead of inheriting from it)

All these would make it possible to configure the dependencies on the main package level, in orbit-db, giving the users more flexibility in choosing which modules and versions they use.

Community efforts

There are a number of community efforts that we'd like to focus our attention towards getting merged, for two reasons. First, we value our community's input and want to further streamline their contributions and second, we want to make sure they are merged before the breaking changes int he next section.

See the GitHub project for more info.

Breaking changes

Ok, on to the main event.

Given the scale and impact of these changes, backwards compatibility may be abandoned and we would make a new major version to signal the breaking changes.

Oplog Watermarks

As of right now, the kvstore and docstore currently reduce the full log any time updateIndex is called, which is on every write to the oplog. This is slow, grows even slower over time and ultimately unnecessary.

A solution to this could be to add high/low watermark and only process "new oplog entries". This is highly applicable especially for KVStore and DocStore.

Database Encryption

It should be possible to encrypt the payload of an OrbitDB. We've been pushing this back in the past because maturity of the technologies used was not there yet and we wanted to give the user flexibility. Admittedly, we now want to take the onus of implication by "suggesting" a default encryption scheme. However, it's become increasingly apparent that something like this is necessary.

This is another change that touches and effects everything in the architecture as well as data formats, and this could be considered the beginning of the discussion.

  • How many keys and for what are they used to encrypt (oplog entries vs. payloads)?
  • Where are they stored?
  • How does this effect AccessControllers?
  • How does this tie into hot/cold data (see below)?

Many projects have rolled their own solutions, e.g. TallyLab using the nacl-js library, the proposal for dag-jose by @oed, and so on, so there are places to seek inspiration.

Async Iterators / Streaming

When applicable we should be using async iterators / generators (or streams) to process data and then "discard" it, allowing more real-time capabilities and the ability to return results as they become available, instead of waiting for the full log to be fetched or processed.

Hot/Cold Data Separation (in-memory vs. on-disk data)

Currently the entire database (log entries and the computed state) are loaded into memory in its entirety before use. This takes time to load and uses more memory. However, this is another massive change that effects every other part of the system.

This will also positively effect perceived performance and user experience: Entries would load fairly instantly, and reasoning about the state of the db replication becomes easier.

The general idea here is to 1) read and compute database state on-demand (ie. upon query) 2) cache "warm" data (=data that is most likely to be used soon, or was recently used) in order to have a configurable in-memory cache 3) fallback to reading from disk when the cache doesn't have the data available.

Misc. Cleanup

Some more items that are smaller in size / complexity.

  • clean up all events and their semantics (eg. only one "updated" event instead of "write" and "replicated"), perhaps remove some and only use callback (eg. "onLoadProgressCallback").
  • separate identities/keys from oplog entry. CID per identity/key. cuts N bytes from each pubsub message and bitswap/ipfs/ild block transfer.
  • kvstore: keep only the keys in memory, make them point to the CID with the data and fetch data on query from cache/ipfs.

Potential Rust Integration

We also, during the course of this work, want to explore integrating Rust into the project, in two potential places:

  1. Specific places that can benefit from wasm performance, likely things like crypto verification and maybe CRDT calculation
  2. Implementing pieces of OrbitDB as separate Rust integrations, to be used with Rust project like Rust IPFS

Conclusion

These are our plans for 2020 onward. With these features and changes implemented, we believe OrbitDB would be on par with our vision for it as well as the user needs, and would make an excellent version 1.0.

Let us know what you think, and again, if you find any of this valuable and want to help, the best way is via the OrbitDB Open Source Community or the OpenCollective.

haadcode avatar Jul 13 '20 09:07 haadcode

This is awesome! Thanks for writing this @haadcode :)

Glad you've had a look at dag-jose! An initial implementation can be found here: https://github.com/ceramicnetwork/js-dag-jose (only support for signatures right now, but encryption will come soon)

oed avatar Jul 14 '20 16:07 oed

I really appreciate what you have planned for OrbitDB.

I'm thinking about wether it fits a mobile application with the following requirements (some of them):

  • Application needs to work offline. No server at all. Just peer to peer via BLE or WiFi Direct (or similar).
  • App will provide a distributed database of objects with different properties. Text, Numbers, Photos.. blabla
  • System is masterless and allows new devices to join or leave at any time.
  • Data is end-to-end encrypted (and signed) and devices are authenticated (challenge response?). Without use of private keys as mobile apps are insecure places to hold them.
  • Stored data is encrypted and signed.
  • Orbit would need to work aside of something live react-native or vue-native on Android and maybe iOS.

Do you think OrbitDB could handle that from early 2021 on? I'm still evaluating and searching for options.

Thank you! Great work 👍

Really loved to play with the demos and browse through your code of the OrbitDB modules.

Pluto1010 avatar Aug 19 '20 15:08 Pluto1010

Afaik, I could be wrong, there are no plans for chalenge/response type authentication, it would be exceptionaly hard to implement via ipfs. Any encryption of the data is left as an exercise for the app developer, as this database isn't designed to 'protect' data, it is designed to 'share' data. As for end to end encryption, ipfs does use encryption in its transport layer, until now SECIO and recently NOISE. However that isn't any sort of guarantee of secrecy because anyone who knows the database adress is able to 'join' the swarm and fetch a copy of all the data for themselves

OrbitDB has access controls on who is allowed to write based on either the ipfs key, a crypto currency key, or any way you want to implement the access controllers canAppend function. The same is not true for read operations, having no access control by design, making the database publicly readable by anyone.

I could be wrong and I could be unaware of plans to implement some of these features ; if so I'm sure someone will chime in with a correction :)

phillmac avatar Aug 19 '20 19:08 phillmac

Started hashing this out in project form here: https://github.com/orgs/orbitdb/projects/7

aphelionz avatar Aug 30 '20 21:08 aphelionz

Update on Developer Experience: The Publish Dance

I tackled a big part of this simply by moving orbit-db-store to a peerDependency in all of the orbit-db-****store repos. This makes it a lot easier to do fast updates and publishing by skipping the need to update those unless their code actually changes.

aphelionz avatar Sep 12 '20 02:09 aphelionz

Hello! Thank you for your work! Please add a pagination or something like that. At now i've implemented it by creation of a new instance of OrbidDb, then it loads (with the 'load' method) a new piece of data from the new instance and at the end queries the new loaded database with the 'iterator' method. It's not so cool as i wanted) So i beleave it will be very helpfull if there is a native way to do such things.

pashoo2 avatar Oct 02 '20 16:10 pashoo2

Hi @pashoo2!

In terms of developer-facing APIs, I think this is a good idea. Something like db.query(queryFn () => {}, { /* pagination options */ }) makes sense to me. That can be accomplished rather easily with a pr to orbit-db-docstore, I would think :+1: To you and whoever else is reading: we accept PRs :)

Also, to note: in terms of performance and RAM usage, this is what the Async Iterators / Streaming item should help with. Essentially right now the entire oplog is loaded and kept in memory along with the database state. So our plan is to traverse the oplog and then just unload log entries from RAM once they're processed. Then it's just the database state.

In-memory pagination is easiest - keep the whole database state in RAM like it is now and just let the JS function mentioned above do the querying. From there, actual RAM paging is peak performance - but it's tricky because you'd have to traverse the whole oplog again since there's not a clean mapping of DB pages and oplog entries.

aphelionz avatar Oct 02 '20 17:10 aphelionz

Essentially right now the entire oplog is loaded and kept in memory along with the database state. So our plan is to traverse the oplog and then just unload log entries from RAM once they're processed. It will be great. Process entries from a persistant storage, if it's really neccessary, and than just keep them in the IndexDB database(it also supports ArrayBuffer, and these feature can be used, for example, for performant way of encryption/decryption of the whole entry). So if it's really neccessary to access on entries why not to query IndexDB directly, or, may be, if it's neccessary for performance reasons, just keep in memory just a meta information about entries, instead of a whole entrie's data. When replication with another peer is neccessary just read messages from the IndexDB, may be from WebWorker or within some queue created with a help of 'requestIdleCallback' like React team uses in their Fiber algorithm.

pashoo2 avatar Oct 23 '20 18:10 pashoo2

Add a "merge fields" option to DocStore.put to merge the fields of the current doc and updated doc

There are two ways this could be done 1) reading current doc state and merge with updated doc to put as new doc state, which is what users manually do now 2) adding a new operation which merges at time of computing the index. Method 1 seems like the better fit especially with how the index is computed currently even if method 2 may come with some data preservation during conflicts. Somewhat related to this, creating a new store similar to Automerge could be beneficial. A store that represents a fully merge-able JSON object allows for a lot of flexibility on the application end.

Remove the need for a database name and just use the CID as the address. Move everything else to the manifest (which already contains the name of the database).

This should simplify things and orbitdb internals dont need to worry about the name anymore (not sure if this is already the case) when referring to the store. I propose we make the manifest available from the store class at something like store.manifest. The address class should still be able to parse addresses with /paths, not sure if it should preserve it but why not? I think it would be good to have a .toJSON on the address class that returns the address as a string. I've brought this up before but its been a while.

Remove the need for separate load() (but keep it available) and provide a one-liner to start, eg. OrbitDB.open(address) performs the instantiation of the orbitdb object, opening of the database and what currently happens in load().

I think making load an option for OrbitDB.open with a default value of true would allow more flexibility to users, there may be things they want to set up on the store class synchronously. Keeping .load available on the store for when a store was opened with load: false and for whatever it does when called over again.

remove the inheritance of stores and inject the Store module to stores remove ipfs-log dependency from Store and inject it from OrbitDB generally switch all inheritance to dependency injection (eg. feedstore takes in as a parameter an eventstore instead of inheriting from it)

This idea of dependency injection does majorly help with the 'publish dance' by flattening the package dependencies. I wonder to what extreme this could be taken to reduce bundled code sizes by using the same version of a package all the way down (although this may come with some compatibility issues it could be worth). Structuring this correctly in the source will make this really cool I think.

As of right now, the kvstore and docstore currently reduce the full log any time updateIndex is called, which is on every write to the oplog. This is slow, grows even slower over time and ultimately unnecessary. A solution to this could be to add high/low watermark and only process "new oplog entries". This is highly applicable especially for KVStore and DocStore.

As I understand them, high and low watermarks seem to come with some challenges and advantages. When used with CRDTs I think we should try to preserve the properties of CRDTs in most situations. For example low watermarks are purposed to throw away old portions of an infinitely growing log after reducing them into a state for all following log entries to build on. Throwing away log entries could lead to conflicting states if two different nodes apply differing low watermark states. The likelihood of this could be reduced by maintaining a quorum (which could be useful to users in some situations). If these watermarks/snapshots are added to the log for other peers to use, which I believe @aphelionz has mentioned, throwing away old entries may not be as big of an issue. Adding a snapshot reference by CID to an entry in the log. The low-watermark parameters could be optionally chosen by the user, things like max log length and minimum distance from the head before entering a watermark.

Another option for improving index calculation could have to do with changing how stores handle index updates currently. Right now updateIndex is what is used to update a store index which takes the entire log and sets the new state. If updateIndex is replaced with a method that takes the current state and an entry and returns a new state the root store class has much more flexibility for perf improvements (this may also be more friendly for iterators). We could handle cases where the height of a new entry is higher than the current index height and just apply the entry (eg on write). We could also do something where we have index snapshots saved to disk at a number of places back in the log (like local high watermarks/checkpoints) although if the index is large this is probably a bad idea.

Something else that could improve performance, at the cost of stable latency for reads, is making reads asynchronous. This would make it possible to only call updateIndex after a read method is called AND the index is not already up to date. Related to this would be making an option for stores that make it so updateIndex or its equivalent must be called manually which would benefit archival or nodes solely in the business of replication.

Reducing the metadata size of entries may also be a good place to look at. For instance deduping identity data with ipfs by using a reference cid to the identity in the entry, removing the clock.id field, etc.

Database Encryption

Payload encryption seems much easier and comes with the benefit of nodes without the encryption key can traverse the log and replicate entries. As opposed to encrypting the entry and payload. I guess optimally both are available and concurrently by option with use of a simple inject-able encryption module that handles the keys haha. Dag-jose looks cool for entry encryption.

This is less to do with encryption and more to do with manifest privacy, but I'm curious about hiding the manifest address in entries and using a hashed version of the address as the pubsub channel for head adverts. Of course the manifest could also be encrypted with dag-jose for whatever reason if it made sense.

Async Iterators / Streaming Hot/Cold Data Separation (in-memory vs. on-disk data)

excited for these

clean up all events and their semantics (eg. only one "updated" event instead of "write" and "replicated"), perhaps remove some and only use callback (eg. "onLoadProgressCallback").

removing "write" and "replicated" for updated seems reasonable but more in favor of just adding updated and making cuts in other areas.

tabcat avatar Nov 25 '20 12:11 tabcat

Good to see you, @tabcat !

aphelionz avatar Nov 25 '20 13:11 aphelionz

Database Encryption - yes, it will be great, i've implemented and integrated it with OrbidDB by myself. But it will be cool if there is outofbox support.

pashoo2 avatar Dec 18 '20 18:12 pashoo2

I think that providing the usage of a custom CRDT implementation is a great idea, but it should be thought out a lot. Because the main idea of ipfs,ipns and all related infrastructure, that a data shared should be readable by anyone and everywhere. And that idea can be broken if users will be able to use a custom CRDT.

pashoo2 avatar Dec 18 '20 18:12 pashoo2

Just noticed this thread but I've been working on a store implementation that builds and saves B-trees to search secondary indexes and stores the data in MFS. Primary index is read/write from MFS to utilize the underlying indexing and not have to load b-trees if you don't need to use them. My goal is to build a SQL-like engine to query directly in the browser and sync it with Orbit.

Trying some different things to help address the amount of time it takes to load. This store loads virtually instantly if it's up to date. Not sure how scalable this is but seems to work fine for datasets up to a few thousand records.

It's relatively slow to write because it has to build and maintain b-trees for each indexed field, but when working with data in the browser seems to have a better UX. The main reason being that load() is basically just skipped.

Using these b-trees: https://github.com/applitopia/immutable-sorted

https://gitlab.com/american-space-software/orbit-db-mfsstore

Edit: I'm actually going to put the underlying engine into a standalone library first and then attach the orbit stuff on top of it. https://gitlab.com/american-space-software/mfdb

ptoner avatar Jan 13 '21 00:01 ptoner

It would be nice to have correct typescript typings in v1 also.

justinmchase avatar Feb 08 '21 03:02 justinmchase

Also, it would be nice to see standards delivered for the file formats being persisted and some considerations for versioning. If the files are distributed and many people access the files, but the libraries aren't also shipped with the files then you'll have issues where different consumers may have different versions of libraries accessing the files and standardization and versioning will be very important.

I'd be interested in hearing thoughts on how to deal with those situations.

justinmchase avatar Feb 10 '21 14:02 justinmchase

So, what are the goals of the goals raised here are still pursued?

CSDUMMI avatar May 18 '21 14:05 CSDUMMI

This checklist is still actively being worked on.

aphelionz avatar May 18 '21 14:05 aphelionz

How do you plan on doing this? https://github.com/orgs/orbitdb/projects/7#card-44572251

CSDUMMI avatar May 18 '21 14:05 CSDUMMI

In other words: How do you plan to replace inheritance by dependency injection? Could somebody perhaps do this for one of the store as an example?

CSDUMMI avatar May 20 '21 08:05 CSDUMMI

On the watermarks: Am I properly understanding, that it would store the state at some point in the oplog and make it possible to work out from it.

I thought, that it might be better to implement a lazy oplog parsing, instead of watermarks.

This means that instead of upon writing the oplog is parsed once somebody tries to read from the database.

This could be much faster overall, especially with large databases where it is unlikely that all the data in the store will be needed at any one time.

CSDUMMI avatar Jun 02 '21 05:06 CSDUMMI

That's more or less what it does now - the opload is read on db.load for all store types, and only in certain store types does it also calculate upon write (kvstore and docstore IIRC). I think those particular calc-on-writes can be refactored away.

aphelionz avatar Jun 02 '21 09:06 aphelionz

What crypto library does OrbitDB use? Specifically for Public-Key Cryptography?

CSDUMMI avatar Jun 05 '21 08:06 CSDUMMI

i think it would be worth to encode data ourselves using one of the dag encoding libraries like @ipld/dag-cbor. having control of the storage format would mean using the ipfs block apis which i imagine would be less susceptible to change and give us more flexibility. also this allows us to directly compute the cid without waiting for the data to be written to disk, and while this is possible using the dag apis already im not sure if it would be the same since i dont think the encoded data is returned as well and you are still relying on higher level apis. i think it would also be good, while still only officially supporting ipfs as the data/communication layer, making it easier for users to swap in whatever they choose by making everything a bit more modular.

tabcat avatar Sep 02 '21 18:09 tabcat

What other data/communication layer do you have in mind? And what divisions do you think of as most advantageous for this purpose?

CSDUMMI avatar Sep 03 '21 15:09 CSDUMMI

@CSDUMMI I dont have any others in mind, but i know users do and it could be made easier.

tabcat avatar Sep 03 '21 17:09 tabcat

Could database encryption discussion be moved into a separate issue and maybe the arguments and implementation details be gathered there?

CSDUMMI avatar Sep 11 '21 11:09 CSDUMMI

I think separating the manifest into its own new class/data type would be good. Meaning it would be similar to the db address data type and would be used to open databases. With this change only i like the idea of only 'opening' databases and not creating them (removing the .create method), instead first fetching or creating manifests to open databases.

example

// creating a new database
const manifest = await orbitdb.writeManifest(dbConfig)
const db = await orbitdb.open(manifest)

// opening remote database
const manifest = await orbitdb.fetchManifest(dbAddress)
const db = await orbitdb.open(manifest)

tabcat avatar Sep 20 '21 21:09 tabcat

I'm in favor of consolidating the operations into one. Tossing in a bit of semantic feedback in here while the window of opportunity is open:

Is there another operation besides create and open that encapsulates both? The technical term is upsert but that doesn't exactly roll off the tongue? init is close but not it... any ideas from the crowd?

aphelionz avatar Sep 20 '21 22:09 aphelionz

open seems best imo. you are opening a local replica, which may or may not exist yet, for a manifest that was created or fetched. it seems like you might be suggesting keeping the creation of a manifest as part of the method but I think it should be separate so the only valid parameter is the manifest data type; even if its a bit more work for users. im more in favor of deprecating the .create method

tabcat avatar Sep 20 '21 22:09 tabcat

Yeah, it's fine. Was just taking the opportunity :grin:

aphelionz avatar Sep 20 '21 22:09 aphelionz

This change is really major.But it could really help people to understand some part of the internals of OrbitDB.

I have a question: What functions should be available on this OrbitDBManifest class? And what public attributes should there be?

CSDUMMI avatar Sep 21 '21 19:09 CSDUMMI

This change is really major.But it could really help people to understand some part of the internals of OrbitDB.

I have a question: What functions should be available on this OrbitDBManifest class? And what public attributes should there be?

.toJson() .orbitDBAddress .accessController

I suppose now would be a good time to specify how to handle metadata in the manifest, so .tags or .meta or something

phillmac avatar Sep 22 '21 01:09 phillmac

Anyone has idea, if there is a Rust client for Orbit DB?

contactrakeshjadhav avatar Sep 24 '21 03:09 contactrakeshjadhav

@contactrakeshjadhav

  1. Consider joining the OrbitDB Matrix Channel, where these questions can be better answered.
  2. Any version but the JS Implementation is most certainly behind in development. And there are only python and golang implementations that I know of: See the README.md Section

CSDUMMI avatar Sep 24 '21 18:09 CSDUMMI

Interesting idea for a new address format: /<cid>/orbitdb/<name> and the corresponding manifest structure from cid root would be: { orbitdb: { <name>: { type, access, [meta] } } }

Currently the address format make databases very human discernible; first as orbit-db database addresses by including orbitdb and secondly from other databases by including the name of the database. ~~however they aren't valid ipfs paths and the format is pretty unique to themselves.~~ This change, though the raw manifest looks a bit strange, would allow for an address that is human discernible and which resolves to a manifest by ipfs.

With this implemented it would be important to strictly define the address as being made of ALL of these parts in this order. Only supplying the root cid or an address with the name missing would leave the software to guess which <name> manifest to open, as there could be multiple at /<cid>/orbitdb/.

It could be interesting to stack manifests vertically (e.g. /<cid>/orbitdb/manifest-here/manifest-here-2/also-a-manifest-here) where the root cid contains 3 manifests and this address referring to the third one. This seems ok to me and doesnt seem like it would be harmful but this address/manifest format could be friendlier to this by keeping the manifest for /<cid>/orbitdb/hello at += /.manifest. Although this use case could simply complicate things and not be worth the trouble.

Stacking manifests horizontally would always be possible, even if the was restricted from including / characters but I dont see a reason to make it easy to create as part of the orbit-db api.

the easiest way to try writing and resolving this format currently is with the ipfs.dag api and using the path option to resolve /orbitdb/<name>.

tabcat avatar Sep 30 '21 06:09 tabcat

We should stick to /<protocol>/<cid> format as this is the standard convention in the IPFS/libp2p (and others) ecosystem. We can drop the /name part from the end and put that in the manifest.

haadcode avatar Sep 30 '21 09:09 haadcode

Are there actual use cases where orbitdb address need to be parsed as ipfs paths?

I don't find that the current address scheme is particularly unique (as @haadcode mentioned).

But if @tabcat 's scheme finds more use cases, I would probably prefer it to the current model, since it also conveys a very signifcant fact about OrbitDB: OrbitDB is not a new protocol for P2P data exchange, like IPFS, but a library built upon IPFS.

Using the same address format used for IPFS implies (somewhat) that OrbitDB is indeed a protocol, not a js library.

OrbitDB in it's current form is a library, not a protocol in my opinion, since there is no standard for the P2P messaging, oplog formats, etc.

It's all just an implementation without a specific standard to follow, right?

Thus, because OrbitDB is not a protocol, the <protocol>/<cid>/<name> scheme is kind of a misnomer.

CSDUMMI avatar Sep 30 '21 10:09 CSDUMMI

Orbitdb is a protocol. While we don’t have an official spec for it yet, the implementation defines the protocol as it is today.

haadcode avatar Sep 30 '21 12:09 haadcode

In that case I would like to add a TODO for 1.0: In order to allow for interoperability between different versions of OrbitDB (either in the same language or in different languages) there should be defined an official reference specification of the OrbitDB Protocol based upon v1.0.

This specification should define the bare minimum requirements necessary for talking to any OrbitDB peer after 1.0. It should serve the project well to do this, because based on such a specification implementations in other languages can be compatible with the JS implementation and future versions can ensure backwards compatibility through compliance with the specs.

Just referring to the code for these cases would also mean, that future versions and other implementations could not fix bugs, because other versions or implementations might have considered this bug to standard behavior and rely on it.

Thus we need an official reference specification of the minimum requirements for OrbitDB peer to interact with other peer, to prevent OrbitDB from fracturing in the future.

Or at least I think that an official reference specification could prevent a lot of such head ache. And I think that 1.0 would be the right baseline for such a specification, since after 1.0 OrbitDB has to consider backwards compatibility for stability reasons.

@contactrakeshjadhav's question shows, that there is interest in other orbitdb implementations.

CSDUMMI avatar Sep 30 '21 17:09 CSDUMMI

  • Replicator Refactoring

I suppose this can be checked out? 🚀

chrispanag avatar Sep 30 '21 18:09 chrispanag

@chrispanag yes we now traverse the next field as well although im not sure we are done refactoring the replicator :smile: there is also some work from haad that could be added but you should see a significant improvement in replication times with version 0.27

tabcat avatar Sep 30 '21 18:09 tabcat

We should stick to // format as this is the standard convention in the IPFS/libp2p (and others) ecosystem. We can drop the /name part from the end and put that in the manifest.

Yes i dunno how this slipped my mind, I've made the connection before. I believe the convention specifically is multiaddr and it would probably be best to fit it. Cutting the name isnt a huge problem as the protocol handler is still readable by humans which is referenced in the multiaddr spec, and the name is part of the manifest as you reference which makes it easy to access. I like this change a lot and a SCP could be written which delved deeper into multiaddrs and the multiaddr library.

tabcat avatar Sep 30 '21 19:09 tabcat

@tabcat your proposed format is not without it's use though. It would enable us to for example store a link to a SPA using the OrbitDB Store:

<cid>/orbitdb/<name>/app 

Which could then be easily opened:

https://ipfs.io/ipfs/<cid>/orbitdb/<name>/app

And here too you could remove the part and just put a single manifest into the orbitdb object.

CSDUMMI avatar Sep 30 '21 19:09 CSDUMMI

You could still build massive bundled manifests if you wanted, they would just need to be separate blocks, so they each get their own cid and then refer to each others cid. orbit-db manifests are not uploaded using the ipfs protocol. they are ipld objects so i dont believe the ipfs handler would be able to resolve their cids, at least ipfs.get cannot. It's probably much better to just hard code the orbit-db address into the app instead of the app into the manifest.

tabcat avatar Sep 30 '21 19:09 tabcat

After reading about multiaddrs a bit im assuming the convention is more to do with prefixes like /ipfs/<cid>, /ipld/<cid>, and /ipns/<cid>.

tabcat avatar Oct 01 '21 02:10 tabcat

What's the state of this issue?

CSDUMMI avatar Nov 03 '21 15:11 CSDUMMI

Is there an update on the BTree Indexing for KVStore and DocStore? I have a use case where more than 1M keys is required.

buildgreatthings avatar Jan 03 '22 22:01 buildgreatthings