Polykey General `Discovery` fixes and features

Specification

This is an epic tracking the current work related to Discovery. There are a few things that need to be addressed. There are a bunch of existing issues but they need to be flattened out into some atomic tasks that can be done separately. This Epic will be handled by @tegefaulkes and @amydevs .

Currently there is no feedback when discovery is being done. Given this is a backgrounded system, in the past this would've been tricky to address but with work with event systems and the audit domain we've worked out a lot of the kinks when it comes to addressing this. We'll need to add a CLI command that will output discovery steps as they happen. Some degree of filtering needs to be involved as well. I think it makes more sense for this to be an audit domain command since it shares a lot of similarity with the connections auditing.

Currently there is a bug with discovering identities. By design there shouldn't be any constraint when an identity claims multiple nodes. But there is a bug where multiple cryptolinks to an identity is not recognised. So our logic is not handling multiple gists.

We're missing periodic re-discovery. So it seems that discovery always needs to be triggered. When a node is discovered, it needs to be added to the discovery queue to be discovered again after a period of time.

The GestaltGraph isn't updating with new information. This needs to be investigated. The gestalt graph is updated by the discovery process. So if there is a failure there or no re-discovery is being done then that could be the cause.

We need some quality of life features to streamline the sharing and permissions process. I think right now we can't set ACL permissions unless a node already exists in the GestatltGraph. We can also trigger automatic discovery identities that are friends of your linked identity.

We need some way of handling dead nodes and revocation of links. I'm unsure if we check with both sides of a link before considering it valid. Any certificate indicating a cryptolink has singnatures of both sides so it can be validated without actualy contacting anyone. However we need to validate links to see if they're still valid. In this case if a gist is deleted then that would invalidate the link, or if the claim is misssing on a nodes sigchain. Basically if we can't find the original copy of a link then we need to consider the link revoked.

The discovery logic is a little messy right now. Parts of it can be factored out into protected utility functions. But generally readability of the domain needs to be improved.

Additional context

https://github.com/MatrixAI/Polykey-CLI/issues/40#issuecomment-2019231993

Tasks

These are the sub-issues for addressing each point above.

General feedback for the discovery process. - https://github.com/MatrixAI/Polykey-CLI/issues/162
Bug with handling multiple cryotlinks for a single identity. - https://github.com/MatrixAI/Polykey-CLI/issues/163 #328
Periodic re-discovery and make sure the GestaltGrapth is updating with new information. - #691
Quality of life and streamline features such as automatic friend discovery for identities - <New Issue>
Handling dead links and gestalt revocation - https://github.com/MatrixAI/Polykey-CLI/issues/164
General discovery code cleaning and refactoring - <new Issue>

Mar 28 '24 01:03 tegefaulkes

ENG-31 General `Discovery` fixes and features

Mar 28 '24 01:03 linear[bot]

#462 - This is a pretty broad issue that wants to address a few problems. Good for reference but its too bloated. It's relevant but for the sake of dividing up work I need more atomic issues.

I also need to create a new issue to address Quality of life and streamline features such as automatic friend discovery for identities. I'll need to spec that out some more, we need to work out some pain points with sharing vaults and discovery to get a better idea for this. @CryptoTotalWar

And one more issue for handling dead claims.

Mar 28 '24 03:03 tegefaulkes

Some quick notes.

How to handle automatic discovery of peers,
- We need to handle is as tasks, there would be one step where each peer is queued,
- Checking each peer would be its own task.
- We need to factor in rate limiting for API calls, but we don't want to block tasks, while doing so.
Rediscovery, the existing issue for TTL actualyl covers this mostly with some extra stuff. The extra stuff can be separated out from that later.
QOL is the discovering peers but also other things that streamline the discovery/sharing process. We need some user input for pain points to expand on this.
Handling dead links and revocation. Two parts, the gist can just be deleted from the identity. But sigchain claims can't be deleted. They can only be unclaimed with a new claim on the sigchain. So when an identity is un-linked the gist is deleted and the claim is revoked with a new claim. Node to node claims have both sides revoke the claim. but it's possble that only one side is actually revoked. In this case the claim is technically invalid. but that leave 3 levels of validity, both claims are found, Only one side is found, or neither is found. Only having both claims makes the link valid.

Moving forward I'll start on the re-discovery logic in issue #691. @amydevs will start on looking into the bug with issues https://github.com/MatrixAI/Polykey-CLI/issues/163 #328

Mar 28 '24 03:03 tegefaulkes

Discovery progress report.

Only thing that's been addressed so far is updating visited vertex tracking and skipping.

We still need to do everything else. Highest priority internal stuff is

rediscovery - 2 days
Better error handling - 1 day
Retrieving only new claim information 1 day

External CLI stuff

Identities status command - needs to be scoped out more
Identities unclaim and handling dead cryptolinks - needs to be scoped more.
Streamlining features - Needs to be scoped more and an issue created.
General code cleaning and refactoring - Could use an issue, but so far I've been addressing it as needed in my current work.

External but related.

While technically not a discovery problem, @amy is addressing the poor feedback with vaults share command and as part of that, upgrading the notifications domain. #695

Apr 08 '24 02:04 tegefaulkes

Here is a general diagram of what a social network would look like.

Untitled-2024-01-23-1145 excalidraw

This is the kind of network we need to preform discovery on. The network is esentially made up of a graph, containing verticies made up of identities and polykey nodes, and edges forming links between them.

There are 3 tiers of edges.

Cryptolinks, These are the most concrete form of a link. You can think of a gestalt as a fully formed distinct sub-graph made up of JUST cryptolinks. Cryptolinks are depicted as black arrow edges above. The circles grouping them are the gestalts.
Trust and permission links, depicted as the blue arrows above. These are the main relationships between nodes. There are gestalt level permissions such as trusting that gestalt. And node-node level permissions such as sharing a vault. These edges form a relationship between gestalts such that we want to know more about them since we're directly interacting with them.
There are weak relationships between identities. Depending on the kind of identity they could be friends, followers, part of the same group, whatever. It just implies a social relation between two identities. These exist outside of the Polykey ecosystem and don't really affect the interaction within Polykey. But it's useful to know about for inviting friends into the polykey ecosystem, for finding friends already using polykey.

Currently Polykey discovery only operates on the first tier of edges. So only whole gestalts are discovered and the user needs to manually trigger discovery on each gestalt to discover them.

To address task 4 in the above issue description, Quality of life and streamline features such as automatic friend discovery for identities we need to make some upgrades to the discovery system. We need the ability to do the following

Follow permission links between gestalts to discover them in the background. We only really need to follow our own permissions.
Allow the ability to trust or set permissions between gestalts or nodes without having to discover them first.
Trusting or sharing should trigger background discovery.
Starting up Polykey should trigger initial discovery on our own node moving outwards.
We should check social level (tier 3) edges to enable the following.
1. Compile a list of friends/followers to invite to use Polykey.
2. Find friend/followers that already use Polykey.
We'd need to have a priority system for processing tier 2-3 edges. Social edges alone could crowd out all other forms of discovery and grind useful discovery to a halt.

As a note, I want to avoid indiscriminate discovery across social links. Social links alone will form a very large graph of potentially all global identities. And we don't need to know about most of them unless we decide to trust them. So rather than a social link being processed and further links queued via it. I'd rather trigger further discovery via the action to trust an identity.

This ties back to the 3 rings of what we care about in the gestalt network. We only really need to track the first 2 rings. That would form a reasonable amount of data to handle.

Our own gestalt
Gestalts we interact with
Everything else.

Apr 12 '24 04:04 tegefaulkes

After some discussion It'll be fine to explore first order social links and their gestalts. But we stop at follower of follower links. This should reasonably restrict our exploration space of the overall gestalt network.

Apr 19 '24 07:04 tegefaulkes

Future optimisation is to focus on directed edges outwards for IdPs that have asymmetric links. For example an instagram user could have millions of followers, but only follow couple hundred people. Therefore auto-discovery would focus on discovering outward directed edges, not inward directed edges. For IdPs that only have bidirectional edges, they generally have more limited connectivity, which is fine to discover it all - like LinkedIn and Facebook, but we would add heuristics to prioritise recency and activity and other metrics measuring "closeness" if possible.

The need to discover immediate neighbourhood is essential for UX reasons.

Apr 19 '24 07:04 CMCDragonkai

This issue's title is way too vague. Can this be made more specific?

Apr 19 '24 07:04 CMCDragonkai

BTW ENG-31 @pablo.padillo this diagram is a great addition to the decentralized trust network concepts that should go into the docs.

Apr 19 '24 07:04 CMCDragonkai

This issue's title is way too vague. Can this be made more specific?

It's vague because it's a parent issue for a bunch of smaller tasks relating to the Discovery domain.

Apr 19 '24 07:04 tegefaulkes

What's the status of this? Can we close this and remaining discovery issues be turned into new issues.

May 06 '24 05:05 CMCDragonkai

Current status is this. All of theses will either have to be resolved or added back to the backlog. There is also a new bug report about discovery failing with multiple claims that will have to be looked into. But right now we're focusing on other stuff.

May 06 '24 06:05 tegefaulkes

Can you set an estimate for this in relation to the subissues and allocate them the to appropriate cycles. I think also you might need to consider how these translate to the 1.0.0 Project now. Since the Polykey CLI Beta Launch is now closed.

May 20 '24 03:05 CMCDragonkai

Just 3 issues left for this. They're not really important enough to be addressed right now, not compared to current work. I'll probably move the renaming issues to the backlog and close out this issue.

May 20 '24 05:05 tegefaulkes

I'm closing this issue as all the most important issues here have been done. The have

May 23 '24 05:05 tegefaulkes

Polykey Polykey copied to clipboard

General `Discovery` fixes and features

Specification

Additional context

Tasks

Polykey
Polykey copied to clipboard