dat-store
                                
                                
                                
                                    dat-store copied to clipboard
                            
                            
                            
                        Add content-type specifier for dat store
Looking at #46 and thinking of multi-hyperbee or cabal it may be a good idea to specify the type of a given hyper URL, (if not specified outright in the URL):
$ dat-store add --type=multi-hyperbee hyper://abc..def
$ dat-store add --type=core hyper://abc..def
$ dat-store add hyper://url/abc..def
Could this be something worth working at?
cc. @urbien @serapath @cblgh ?
i think it would be a non-trivial undertaking to add cabal support (dat-store would have to at the very least add multifeed?) it would be better to consider the perspective of other projects :)
I was thinking that the type would be detected via the Header messages at core.get(0) of a hypercore. What sort of changes would different types give? Different behavior when we're downloading just the latest data?
Regarding multifeed, it'd be really hard to add because we'd need to have a separate swarm for it and a different way of doing storage. 😅 I'd like to stick to using vanilla hypercore-protocol without extra wrappers if possible.
Header metadata is a good way to deal with the type of data store. We added support for them in multi-hyperbee. But note that Header metadata is glitchy, it changes the order of events when store is initialized if I recall. So we had to workaround some abnormal behavior. I talked to Maf on that but I do not think he was aware of the problem then. It could have been fixed since.
i think it would be a non-trivial undertaking to add cabal support (dat-store would have to at the very least add multifeed?) it would be better to consider the perspective of other projects :)
It might not be trivial, but I think cabal (and i assume peermaps too?) will use it, and i'd find it a bit sad if the ecosystem would not be compatible and fall apart even further instead of reversing that somehow
so i read through the cable protocol and opened an issue with additional questions. The list of goals looks like its already supported by hypercore, but I assume I'm missing lots of important points. what was the main motivation to do the new cable protocol?
Below I try to summarize my understanding of both protocols and I make an additional comment with thoughts
- A. giving a short recap of 
hypercore-protocol - B. and then trying to do the same with 
cable-protocol. 
...I guess others know most of this much better than me, so I'd be very happy if you could correct me or add additional information so that i can learn :blush: (or suggest formatting improvements)
A. hypercore protocol messages
loosely based on hypercore source code and https://datprotocol.github.io/how-dat-works
- wireprotocol 
= message | wireprotocol - message
= len_of_rest + channel_and_type + body - channel_and_type
= channel_number + message_type- channel_number is for multiple "dats"
 - message_type is for one of the below listed types
 
 - body
= fieldtag + content- fieldtag
= fieldnumber + fieldtype(expressed in a varint)- fieldnumber
= e.g. 1=discoveryKey (32bytes), 2=nonce (24bytes), ... - fieldtype e.g.
- "1=varint": content
= unsigned_integer - "2=length-prefixed": content
=varint_length + bytes_of_varint_length 
 - "1=varint": content
 
 - fieldnumber
 
 - fieldtag
 
example: (a beginning of a
- message0
- length_of_rest = e.g. 61
 - channel_and_type:
- channel = 0 (=first dat?)
 - message_type = 0 (=open)
 
 - FIELDS (field0)
- field_tag:
- field_number = 1 (=discoveryKey)
 - field_type = length-prefixed
 
 - content:
- varint_length = 32
 - bytes_of_varint_length = 
 
 
 - field_tag:
 - FIELDS (field1)
- field_tag:
- field_number = 1 (=key)
 - field_type = length-prefixed
 
 - content:
- varint_length = ...
- bytes_of_varint_length = <capability (=from key)>
 
 
 - varint_length = ...
 
 - field_tag:
 
 
So given the feed message type, a "channel" [ch.localId] can be associated with a feed to then request chunk ranges for that feed later, right?
- 0 
open- I want to talk to you about this particular Dat - 1 
options- I want to negotiate how we will communicate on this TCP connection - 2 
status- I am either starting or stopping uploading or downloading - 3 
have- I have some data that you said you wanted - 4 
unhave- I no longer have some data that I previously said I had - 5 
want- This is what data I want - 6 
unwant- I no longer want this data - 7 
request- Please send me this data now - 8 
cancel- Actually, don’t send me that data - 9 
data- Here is the data you requested - 10 
close 15- Extension` ...for other stuff
cable protocol messages
my open question issue: https://github.com/cabal-club/cable/issues/1
- wireprotocol 
= messageA | messageB | wireprotocol - messageA
= msg_len + msg_type + random_request_id + body(for request and response msgs)- random_request_id
= <...roll some dice...> - msg_type`= 
 - body`= 
 
 - random_request_id
 - messageB
= pubkey + signature + hash_link + post_type + timestamp(for post msgs)- pubkey
= public key - signature
= signature - hash_link
= link to hash of a previous message- 
Most post types will
linkto the most recent post in a channel from any user (from their perspective) but self-actions such as naming or moderation willlinkto the most recent self-action. 
 - 
 - post_type
= <one of the types listed below> - timestamp
= <timestamp> 
 - pubkey
 
listed types
- hash response (msg_type=0) send a list of message hashes to 
request_id - data response (msg_type=1) send messages to 
request_id - request by hash (msg_type=2) request list of messages by hashes
 - cancel request (msg_type=3) cancel any request_id
 - request channel time range (msg_type=4) get all message hashes in time interval (+ max limit)
 - request channel state (msg_type=5) get or subscribe to state change messages (0+ PAST, 0+ LIVE)
 - request channel list (msg_type=6) get a list of all channels (=topics) from peers
 - post/text (post_type=0) post text message (+ channel & timestamp & hashlink)
 - post/delete (post_type=1) request deletion of a previous message
 - post/topic (post_type=3)
 - post/join (post_type=4)
 - post/leave (post_type=5)
 - post/info (post_type=2) post update to ones own key/value "store"
- keys with defined special meaning:
 
const state = { // special meaning: name, // handle to use as a pseudonym blocks, // json object mapping hex keys to flag objects { reason, timestamp } hides, // json object mapping hex keys to flag objects { reason, timestamp } max_age, // string maximum number of seconds to store posts } 
some of my observations and thoughts:
:sweat_smile: i probably miss the point and don't understand how important the request_id based lookup of things would be incompatible with things nor how e.g. i2p ties into all of this, or maybe they are the same. I probably lack a lot of context, but at least if someone could give me that context so i could learn, i'd be really happy :blush:
Ok, I imagine both would use hyperswarm and a swarm topic (alternatives might work too) to find other peers and then once some peers are found for a given topic, the rest can start by exchanging messages with around a dozen different possible message types.
my secret hope is all message sent by a single specific sender with a specific pubkey can still be stored in a hypercore and that messages would allow to derive what hypercore they belong to or at least discuss or see how far in that direction it can go or if there are certain reasons prevent that, i'd love to learn about it too.
hypercore:
- peers start to exchange feed chunks (with 
channel number + message types) - every message has a hash and an associated feed channel number (and can be merkle + signature verified)
 
cable:
- peers explore channels to join
 - peers start to exchange messages (with 
channel + message/post types, e.g. requests, responses of posts) - every message has a hash and an associated channel (and can be verified via pubkey + signature)
 
hypercore vs. cable
- Is or could the 
pubkeyof a peer included with each post maybe be a hypercore address of the sender? - Then all messages of a sender could also have chunk indexes in that senders hypercore
- an implementation could help to lookup hypercore indizes based on a message 
hash_link - the 
timestampincluded in each message could be extended with the chunk index (vector clock?) - even the 
signaturemight be skipped to save space and use senders hypercore to merkle verify instead? - the 
hash_linkcould also theoretically be replaced with a chunk index + posters hypercore address - => overall, the signature in each message would be replaced with 2 indexes:
- one added next to timestamp as an index in the 
pubkeyhypercore of sender - one added next to 
hash_linkreplaced bypubkey2hypercore as an index in the hypercore of sender of referenced message 
 - one added next to timestamp as an index in the 
 
 - an implementation could help to lookup hypercore indizes based on a message 
 
message size comparison
- hashlink or pubkey are both 32 bytes, right?
 - two additional indizes add a bit of size, but less than the signature which can be removed
 - peers arround a topic and channel are supposed to have all the messages regardless of how they are stored
- so an implementation could look them up quickly and send them out to the requester
 
 - all 
self actionscould save the entirepubkey2and only use an index, becausepubkey2=pubkey - also - in a request or response of a list of hashes, all hashes with the same sender could be replaced by a single pubkey address + indexes and/or index ranges to save additional space, or not?
 - another thing is more a question about post/info message where the value is very large - wouldn't it be better to chunk that up into multiple messages like hyperdrive might do it?
 
perf comparison
- merkle verification with a signature is slower than simple signature verification though
 - but multiple messages from similar senders could be batch merkle verified to save signature verifications?