privacytools.io icon indicating copy to clipboard operation
privacytools.io copied to clipboard

Overhauling Instant Messengers + add Session messenger

Open lrq3000 opened this issue 4 years ago • 35 comments

Description

Add Session messenger, using onion routing and E2EE encryption. Added a new subsection "Nodal messengers" to include onion routing messengers and blockchain based messengers, as the advantages and disadvantages differ from other types.

Resolves: #1678 Resolves: #2232 (redundant with #2311 , this will be removed here if the other PR gets merged before) Resolves: #1357

Check List

  • [x] I understand that by not opening an issue about a software/service/similar addition/removal, this pull request will be closed without merging.

  • [x] I have read and understand the contributing guidelines.

  • [x] The project is Free Libre and/or Open Source Software

  • Netlify preview for the mainly edited page: https://deploy-preview-2293--privacytools-io.netlify.app/software/real-time-communication/#anonymous-routing

  • Code repository of the project (if applicable): https://github.com/oxen-io/session-desktop (there are other repos for each platform: Android, iOS).

lrq3000 avatar May 15 '21 02:05 lrq3000

Build failing because of #2232 I think. I tried to build a PR with just a new image, no markup change, and it still failed.

lrq3000 avatar May 15 '21 03:05 lrq3000

Ok I fixed #2232 so that the netlify preview build is online. It's ready for review.

lrq3000 avatar May 15 '21 03:05 lrq3000

I would have thought Decentralized would have been a better definition than "Nodal" which I've actually not seen anywhere in CS.

Throwing Tor in there isn't strictly correct either and complicates things. If you're talking about .onion servers that is decentralized as well, as both peers contact a HSDir in order to rendezvous.

With Session I always believed it to be "decentralized" with some degree of being a distributed network. The service nodes are required for message storage.

What I am actually thinking is we might change the "Federated" heading to be "Decentralized" and put Session in that category. The Matrix team also refers to Matrix as being "Decentralized.

With Tox or Ring, (Briar requires Tor) it's a mixture, as in "distributed" for peer discovery, (DHT) and then point A to point B for actual communication (peer to peer). Predominately as we care about metadata in this context I would classify it as "peer-to-peer". Neither are anonymous and come with warnings to use Tor if that is required.

268562053

dngray avatar May 15 '21 06:05 dngray

Tbh when looking at the mentioned upsides/downsides of federated services and p2p applications I think session fits neither Think the service nodes + onion routing fixes some issues that p2p has and some issues that federated models have

jeroenev avatar May 15 '21 07:05 jeroenev

Tbh when looking at the mentioned upsides/downsides of federated services and p2p applications I think session fits neither

I'd still put it under Decentralized, (assuming we change Federated to Decentralized) and then say what things don't apply and why.

Think the service nodes + onion routing fixes some issues that p2p has and some issues that federated models have

By definition that is still a decentralized network. The service nodes contain encrypted message data

onion routing fixes

The main one I saw there was requiring the service node to put down 15k to prevent Sybil attacks, that said to a well financed entity I don't think 15k is a lot. I guess it does depend on how many nodes there are in total though.

dngray avatar May 15 '21 09:05 dngray

Thank you both for your replies.

So indeed, I devised this new "nodal" category, because I do think this is a different category from federated. Both are indeed subtypes of decentralized networks, but their conceptual differences produce very significant differences in their threat models and use cases. For instance:

  • Federated servers are still a semi-centralized model, but instead of having one authority controlling the servers, it's multiple authorities. When authorities include the wide public (ie, anybody can run a server), we can say it's decentralized. But this model still imbues controlling powers to the server owners, such as access restrictions, filtering users, content (eg, by keyword) or other servers. Also, the user leaks metadata to the server they connect to.
  • Nodal networks on the other hand decouples nodes from authority, nodes are agnostic. The nodes have no mean to filter any content nor user, and banning other nodes is fruitless, it's only used in case of malicious nodes to protect the network, but banning nodes cannot impair user's ability to communicate since any new route can be created at any moment. Nodes also get no metadata about who the source is (except of course the id/public key). Nodal networks can in fact be seen as self-contained networks, although I never saw this terminology used for communication systems, but note that this is not surprising given this is a very new kind of communication system, Session being one of the first to implement this fully and robustly (BCM messenger needed servers that are now down and other messengers rely on Tor, with less than satisfying reliability as we know since they were not designed for instant communication). I used the examples of onion routing (such as Tor) and Blockchain because that's the technologies that underlie this new class of communication systems, Session being an precursory example (the nodes are using the Oxen blockchain - I should add this info BTW) but certainly not the last, we will certainly see more in the future.

So I'm strongly convinced that merging federated with nodal network messengers would be inaccurate and confusing, as both models work very differently. So although both would fit in a "decentralized" section, I think it's much clearer for users to separate and describe their respective pros and cons.

However I am not attached to the "nodal" typology, if you find a better name...

And that's just my opinion and reasoning for this PR. I will modify this PR according to the editorial board's decision of course.

lrq3000 avatar May 15 '21 18:05 lrq3000

@lrq3000 I totally agree with you although I find that decentralized fits perfectly into what Matrix does, and distributed into what Session does, at least if we look at the graphs above, but maybe I am missing something.

I also think it would be a nice idea to add a small graph next to each category to make it easier for end user to understand in my opinion, even more if you choose to stay with the nodal definition which will be more confusing.

gary-host-laptop avatar May 15 '21 19:05 gary-host-laptop

@LongJohn-Silver Yes, certainly Matrix fits in the decentralized model. But Session doesn't fully fit in the distributed model either, although I guess it would be a better fit.

The quintessential example of the distributed model is peer-to-peer, where all nodes are connected together and play both the roles of users and relayers for other users. Here, with the nodal model Session uses (which is kind of a hybrid between decentralized and distributed now that I think about it), the nodes aren't users, and users aren't nodes. The users are shielded and anonymized precisely because they are outside of the network, and once they enter the network, the nodes take care of all the work for them, the users do not contribute to the network at all.

If I would draw what I think is a nodal network, it would be something like this (excuse me for the crude photoshopping, I'm no artist):

nodal-network

In green are the users (sender and recipient of a message in a Session communication for example), in black are the nodes selected to route messages for this communication, in grey the other nodes that are not involved for this communication, but can be for others. This shows the users are outside of the network, not part of it contrary to a distributed network. Also, the route is not selected to be the fastest, but by other metrics, so that the route acts as a further isolated subnetwork inside the whole nodal network. (Kinda like how the brain works, functional connectivity vs structural connectivity, not all nodes are used and not necessarily the fastest one but the most effective for the task at hand).

About the graphs yes it can be nice to add but I wonder if users will understand? We can also add a link to this tutorial maybe, which has a very illustrative animated version of the graph above to demonstrate the difference in resilience: https://web.archive.org/web/20200614011014/https://hackernoon.com/a-state-of-the-art-of-decentralized-web-part-1-54f70fdb7355

lrq3000 avatar May 15 '21 19:05 lrq3000

What about renaming the section just "Blockchain"? Although this kind of network architecture is not specific to blockchains, most implementations use blockchains, so that would fit most cases.

lrq3000 avatar May 15 '21 21:05 lrq3000

I think blockchain could be confusing, because people immediately assume that messages are stored on the blockchain which is not the case for Session. I don't mind decentralised or distributed either fits well imo.

To give a little background on the network. There are ~1750 Service Nodes, each having to stake atleast 15,000 Oxen (around $21,000 USD) these Service Nodes are broken up into "swarms" which are groups 5-7 nodes which are responsible for a deterministic subset of the networks Session ID's. When you send an encrypted message to a user you send that message to the swarm of nodes responsible for storage of messages belonging to their Session ID, its then replicated amongst those 5-7 nodes for redundancy and stored until the TTL expires, users check their swarm for messages belonging to them, once TTL expires the messages are purged by all nodes in the swarm.

When we look at the available categories

Centralised No, because the network is comprised of ~1750 Nodes run by different operators, Where the user has an equal chance of using any of the Service Nodes

Federated Not in the traditional sense. Federated generally implies a smaller number of centralised servers, where those servers are interconnected and sharing data and where the user can choose which server they want to provide services to them. Session users don't get a choice over which Service Nodes they use for message storage or onion routing, the protocol makes these choices based on set rules.

Peer to Peer No, because clients don't store messages, connect to each other directly, or provide any services to the network

In saying this I think moving Element and Session into the same category of "Decentralised" would be confusing since the network layout in practice is very different between the two, for the reasons described above.

KeeJef avatar May 16 '21 05:05 KeeJef

Yeah session definitely is an interesting case Technically it seems something in between p2p and federated, but UX/usability wise it feels very much like a centralized messenger, since there's no server to choose from like with federation and no always-on requirement like with p2p messengers

jeroenev avatar May 16 '21 07:05 jeroenev

I am thinking it might be best to have 3 categories, Centralized, Decentralized, and Distributed.

Then talk about each application. The reason is because "peer-to-peer" brings its own issues in regard to Ring etc, as it is.. but it is also distributed in regard to the DHT tables where peers are matched. I actually think we could improve the pros/cons section as well by marking specifically which ones may not apply. We don't list that many things so it should be easy.

I would have thought Nodal would have been a type of distributed network. To me it seems closer to that than decentralized, because although there are "supernodes" they aren't really the same as servers in matrix, xmpp, or email. etc

I do agree, with @KeeJef, we should not mention blockchain, people will assume the messages are on the chain. The more jargon the more complicated it gets.

I also think it would be a nice idea to add a small graph next to each category to make it easier for end user to understand in my opinion, even more if you choose to stay with the nodal definition which will be more confusing.

Funny you mention that, when I put the picture up there in this post https://github.com/privacytools/privacytools.io/pull/2293#issuecomment-841610948 I thought about that, i really think this would be a great idea would also help break up the page a bit more.

I think it still would be best not to mention "Nodal" specifically, but just refer to it as a kind of Distributed network, what do you think @KeeJef?

dngray avatar May 16 '21 19:05 dngray

I think it still would be best not to mention "Nodal" specifically, but just refer to it as a kind of Distributed network, what do you think @KeeJef?

Yeah i think nodal might be confusing since its not really a widely used term in this space, most people understand the gist of what decentralized/distributed means although i think they are often conflated with each other.

KeeJef avatar May 17 '21 00:05 KeeJef

But if we move to the centralized/decentralized/distributed nomenclatura, then Session will be placed in either decentralized along with federated networks such as Matrix, or distributed along with p2p networks.

Le lun. 17 mai 2021 à 02:50, Kee Jefferys @.***> a écrit :

I think it still would be best not to mention "Nodal" specifically, but just refer to it as a kind of Distributed network, what do you think @KeeJef https://github.com/KeeJef?

Yeah i think nodal might be confusing since its not really a widely used term in this space, most people understand the gist of what decentralized/distributed means although i think they are often conflated with each other.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/privacytools/privacytools.io/pull/2293#issuecomment-841909456, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIRFXWXE3P6FU3B5MRWHRDTOBR6TANCNFSM445M4SCA .

lrq3000 avatar May 17 '21 07:05 lrq3000

But if we move to the centralized/decentralized/distributed nomenclatura, then Session will be placed in either decentralized along with federated networks such as Matrix, or distributed along with p2p networks.

I would place it under one of those categories, but then say how it is different, and show a diagram, I think that's the neatest way.

I am thinking it is a kind of decentralized network because in a distributed network implies all nodes are of equal importance. The whitepaper in fact describes it throughout as "decentralized". https://getsession.org/wp-content/uploads/2020/02/Session-Whitepaper.pdf

Session works to reduce metadata collection in several ways: Firstly, Session does not rely on central servers, instead using a decentralised network of thousands of economically incentivised nodes to perform all core messaging functionality. For those services where decentralisation is impractical, like storage of attachments and hosting of large group chat channels, Session allows users to self-host infrastructure, or rely on built-in encryption and metadata protection to mitigate trust concerns.

The website also refers to routing through an onion network, which if you were to look at Tor it would be classified as a "decentralized" network.

I think for our description a brief version of that would do, we could then link to the whitepaper.

If you look at the Loki Network, in which Session uses https://loki.network/wp-content/uploads/2018/10/LokiWhitepaperV3_1.pdf it is also described as "decentralized":

Onion routing protocols allow for users to form tunnels or paths through a distributed network, using multiple nodes as hops to obfuscate the destination and origin of data packets.Service Nodes on the Loki network will operate a low latency onion routing protocol, forming a fully decentralised overlay network, called Lokinet. The network does not rely on trusted authorities and its state is fully derived from the blockchain. Users can connect to individual Service Nodes and create bidirectional paths for packets to be routed through.

My understanding is it works like Tor, except that relays have to pay a stake to be relays. The description of a Service Node sounds a bit like a HSDir. HSDirs are stored in a distributed network, which if you think about Loki it uses a blockchain to store the descriptor of who is a Service node. https://tor.stackexchange.com/a/8692 describes it quite well.

dngray avatar May 19 '21 01:05 dngray

IMO just because they described as such does not mean much, even more if as @lrq3000 says this is a new form of networking structure. Not saying it is or it isn't, I am not that savvy in this topic, just saying that it is hard for someone to name something which is coming into existance, even more if what you are trying to do is making it easier for other to understand.

Naming it decentralized/distributed, albeit not the most technically correct, may be the best idea for now until similar software comes to life, even more for end users who shouldn't really care about it.

gary-host-laptop avatar May 19 '21 02:05 gary-host-laptop

Yes as I have said I do think Session falls in-between a decentralized and a distributed model (and in the whitepaper they reference both modes). A section "Decentralized and distributed" is a great idea and it would fit. Or "Hybrid", which would technically be correct too I think. I'll work on such a proposition with pictures (I'll need some time to make them myself to be copyleft).

lrq3000 avatar May 19 '21 02:05 lrq3000

Although it's focused on governance, here's a paper that argue that blockchain based technologies are both distributed and decentralized: https://doi.org/10.1177%2F2631787720977052

lrq3000 avatar May 19 '21 02:05 lrq3000

Session falls in-between a decentralized and a distributed model (and in the whitepaper they reference both modes).

A lot of networks do though. Tor could be classified as a distributed network if you exclusively look at where HSDir data is stored. However when you consider the greater network, and the fact there are HSDirs it is decentralized..

Likewise with a peer-to-peer program like Ring.. The discovery of peers is distributed (DHT), but once connected it's point to point.

It really depends on the context, and in this case metadata is what we care about.

dngray avatar May 19 '21 02:05 dngray

It really depends on the context, and in this case metadata is what we care about.

That's why I proposed a new name "Nodal" for this kind of network, as it's the separation between the relaying nodes and the clients that reduces the meta-data leakage IMHO, but it's not a standard typology :-/

I think at this stage with the currently established typology of networks there is no way around some degree of confusion, but I will try my best to reduce it by describing and illustrating the differences.

lrq3000 avatar May 19 '21 02:05 lrq3000

The link for Oxen Dashboard is corrupted, unnecessary https:// included @lrq3000

Dyrimon avatar May 19 '21 05:05 Dyrimon

Mmmm I am getting some error when trying to include some svg images:

This page contains the following errors: error on line 58 at column 17: Encoding error Below is a rendering of the page up to the first error.

This happens even when I try to directly access the image URL.

Is there some parameters I need to set when saving the SVG from Inkscape?

lrq3000 avatar May 24 '21 02:05 lrq3000

Found the issue, Jekyll is not configured to support accentuated characters. But since my computer uses a locale with accentuated characters, some metadata were automatically outputted in my locale, such as datetime:

Creator: FreeHEP Graphics2D Driver Producer: org.freehep.graphicsio.svg.SVGGraphics2D Revision Source: Date: jeudi 20 mai 2021 à ²3:53:26 heure dÂ’é´© dÂ’Europe centrale

Removing the line fixes the issue, but I also found a tool that trim that plus do some size optimizations, it's opensource so it may be useful in the future:

https://jakearchibald.github.io/svgomg/

lrq3000 avatar May 24 '21 18:05 lrq3000

I have updated the PR with the provided feedbacks.

After researching more and scratching my head, I decided to put Session in distributed networks. Indeed, both onion routing and blockchains are primarily considered as distributed networks.

However, I could not put Session in the same section as peer-to-peer networks, as onion routing is definitely not a peer-to-peer system. We could say that onion routing and blockchains are "indirect distributed networks", where the sender and recipient do not interact together directly, in opposition with peer-to-peer distributed networks where in the end the send and recipient are communicating directly together. Unfortunately, apart from the peer-to-peer networks being defined as a subtype of distributed networks, no other type was formally identified. So I resorted to use "Non peer-to-peer" for the subsection where Session is.

I also added figures and explanations for each network type. They were generated using Cytoscape, here are the source files:

ptio-network-schemas.zip

Each file got exported into a svg file, which was then edited in Inkscape to remove the white background and resize to the file's content with 15 px margin, and then with SVGOmg to clean up unnecessary data and reduce filesize.

Please let me know what you think about the changes.

PS: On Windows, Jekyll live reload doesn't work well, I had to instead use jekyll serve --watch

lrq3000 avatar May 24 '21 22:05 lrq3000

All SVG images should be either 128x128 and 384x128. If the image is going to warp make it smaller and center it top/bottom/left/right on a canvas that size.

They should be optimized, Inkscape does this:

optimize_svg

You may need python-lxml, and scour.

  • https://archlinux.org/packages/extra/any/scour/
  • https://archlinux.org/packages/extra/x86_64/python-lxml/
  • https://packages.debian.org/buster/python-lxml
  • https://packages.debian.org/buster/scour

dngray avatar May 25 '21 13:05 dngray

Thank you very much @dngray for your help and sorry for the delay. The svg images are now updated according to your instructions.

lrq3000 avatar May 29 '21 22:05 lrq3000

I have decided to change the category "Non Peer-to-Peer" into "Anonymous Routing" and make it a separate section instead of being both under the "Distributed network" section. I also rewrote the section to focus on anonymous routing, and others too to restore the old section headers (without "decentralized" or "distributed") but I mention with a link the nature of the network. Indeed, the illustration doesn't need to be explicitly labelled IMHO, it only needs to give an idea of how this kind of messenger work.

To explain why I made this change:

  • Anonymous routing is the name for this type of network, as shown for example by this book chapter: https://doi.org/10.1007/978-1-4419-5906-5_628 -- there are many other academic works using this term.
  • Anonymous P2P was also a candidate, but onion routing and Tor aren't considered P2P networks, although they are often mentioned as an example (note they do not list Tor as an Anonymous P2P network in this Wikipedia page for example, they only mention it for historical pertinence). /EDIT: so I chose to use "anonymous routing" as it is more general and includes anonymous P2P systems and anonymous non-P2P systems such as onion routing.
  • The semantics of what constitute what kind of network is not as clear cut as we could think. For example, P2P networks are considered by some as a kind of decentralized computing, and the Fediverse and federated social networks as a kind of distributed social networks. So I think it's better to illustrate rather than focus on naming things.

Please let me know what you guys think of the latest version :-)

lrq3000 avatar Jun 01 '21 16:06 lrq3000

not really relevant to the discussion but: don't forget to add the audit of session

youdontneedtoknow22 avatar Jun 09 '21 00:06 youdontneedtoknow22

Any progress on this, need any help / clarification from our side?

KeeJef avatar Jul 09 '21 05:07 KeeJef

Shouldn't Briar be under the "Anonymous Routing" section?

And I'm not a programmer so I'm not sure about this, but isn't "The protocol was independently audited" wrong? The clients for all platforms were audited, not "only" the protocol. I'm not sure if auditing the protocol should happen on the client side or server side or both tho.

youdontneedtoknow22 avatar Jul 09 '21 10:07 youdontneedtoknow22