cargo-crev icon indicating copy to clipboard operation
cargo-crev copied to clipboard

Make trust graph consider paths (aka flow).

Open dpc opened this issue 7 years ago • 13 comments

This is related to #44 .

Right now WoT graph is build by just a cost-bounded flooding of a graph. This makes it possible for anyone to create a new CrevId, trust it, and this way artificially increase the possible count of reviewes for a given crate.

This algorithm should keep track of path (id(s) of that directly trusted this one), on each step to the root of the trust tree, and so that when calculating the count of reviews, it's possible to "merge" reviews coming out from a common path, for the purpose of calculating the total trust count.

Example: If you directly trust only one other CrevId, you can only have trust count equal to 1, for any given crate, no matter how many people reviewed it.

dpc avatar Dec 08 '18 19:12 dpc

To explain:

             /- b
root -> a - c
             \- d

In this case, if both b and d create a review, it will count only as one review, to prevent a from being able to create and trust fake IDs, to increase review count.

dpc avatar Jan 05 '19 05:01 dpc

it will only count as one review

Am I correct in saying that the count, in this case, is made from a point of view?

Shouldn't we avoid trusting nodes who we suspect of trusting and creating fake nodes anyway?

ffranr avatar Jan 05 '19 13:01 ffranr

Shouldn't we avoid trusting nodes who we suspect of trusting and creating fake nodes anyway?

Of course. But trust is sometimes broken, and accounts get compromised. Currently I believe that after crev gets popular, everyone should require at least 2 reviews to trust anything. And such review should be "uncorrelated". This issue is a method to make sure no account can increase the review count by creating sock-puppet accounts.

dpc avatar Jan 06 '19 03:01 dpc

I think this would be solved by computing the max-flow from the root to each package, where people are nodes. Does this match what you're looking for?

pimotte avatar Jan 07 '19 20:01 pimotte

@pimotte I am not sure which algorithm you refer to, and I am not that well versed with graph algorithms. After brief looking at https://en.wikipedia.org/wiki/Maximum_flow_problem, I guess one could say I'm looking for max-flow, if all edges have flow of 1, the source is root of the WoT, and all trusted users that have reviewed the package are sinks.

dpc avatar Jan 07 '19 20:01 dpc

Something to consider regarding max flow algorithm:

http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-August/002600.html

dpc avatar Aug 28 '19 18:08 dpc

I've read http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-August/002600.html and I pretty much agree with everything there.

This might affect #329 .

I was naively thinking about implementing a proper max-flow algorithm, but something like tracking number for each node, and splitting it between trusted 2nd-level node is probably easier and can be done breadth-first.

TODO:

  • read https://www.levien.com/thesis/compact.pdf
  • read http://www.squarefree.com/trust/trust.pdf

dpc avatar May 24 '20 07:05 dpc

Could you elaborate a bit on what you mean with "tracking number"?

Other thoughts:

#329 basically introduces the concept of "negative trust". The reporter doesn't phrase it as such, but if your network is saying someone has a certain level of trust and you want to downgrade that, that is what is going on. Introducing negative trust in the core mechanism will likely lead to an even more complex system. Advogato with a max-flow algorithm that works breadth-first seems like the perfect way to go about this. Personally, if that yields unexpected results for the user, I'd see that as a UX problem, rather than an algorithms problem. Those then could be resolved by hard-capping certain results based on reviews from the user or possbily allowing the user insight into why this result came to be, which might lead to a re-evaluation of the trust they have expressed in other parties.

This last bit is actually the counter-intuitive part about all these mechanisms. If I trust Alice and Alice trusts Eve, but I don't trust Eve, than that should lead me to re-evaluate my trust in Alice. If a crev result shows me this, it's very easy to say "oh, but I want to downgrade Eve, so let's tell that to crev". If the user can see that crev is trusting Eve because of Alice, then I get to choose: I downgrade my trust in Alice, since her judgment of Eve affects it, or I adjust my assessment of Eve, because I trust Alice's judgment of her above my own. In neither case I'll input something directly related to Eve, even though she "started" this.

pimotte avatar May 24 '20 10:05 pimotte

but something like tracking number for each node

That's just my way of summing up my shallow understanding of Advogato - keep track of how much flow each node has (some number).

dpc avatar May 24 '20 18:05 dpc

After reading through http://www.squarefree.com/trust/trust.pdf

Looking at these trust metrics etc. I think there are some fundamental differences between how they work, and how I perceive a problem.

First, for me the trust is not binary. For some reason most (all?) the existing trust solutions are binary. Something is either trustworthy or not. In crev, there are levels (and maybe also "confidence"). So the trust is more nuanced. Even without any flow analysis, the trust level can be easily made to "dissipate" as the distance from the root grows.

Then, I consider everything subjective and want to embrace that subjectivity. There is no one global "this is trustworthy" vs "this isn't". It's all opinions, and depending on circumstances two different agents, with same knowledge, can reach different conclusions about what is trustworthy.

Third, a combination of the previous two: there is no fixed point where something is trustworthy. Even the same person, with the same WoT, can run multiple verification runs, with different parameters (trading: depth, required trust-level, required understanding level, required redundancy, etc for each other), and look for the most susceptible pieces of code to review themselves.

If I trust Alice and Alice trusts Eve, but I don't trust Eve, than that should lead me to re-evaluate my trust in Alice.

The subjectivity is the reason why I don't think it's necessarily the case. It would make sense if we considered everything objective. But if it's subjective, then it's totally fine to have a different opinion. With the confidence stuff, there will be a way to possibly overwrite inferred trust if needed. But that will require a conscious decision of someone analyzing their WoT.

Originally I wanted flow analysis pretty much just to make the redundancy option (number of required reviews) to be attack-proof. So that someone in the WoT can't just create sock-puppets to create "copies" of the review that would count as many different reviews.

So my thinking was - after finding all the reviews, just run some kind of a max-flow algorithm from all the matching reviews to the root, and drop/merge multiple reviews coming through the same trust-bottleneck to count as one.

dpc avatar May 25 '20 06:05 dpc

The problem I'm looking at might not be a max-flow algorithm... . I'm only looking at flow of 1 everywhere, and I'm not interested in the flow value, but at actually pruning "duplicates". A naive version seems fast and simple:

  • when flooding the graph from root, build "previous node" list for each node
  • for every node with a review (sorted from closest to root to most distant) "traverse the path back to root":
    • go node by node to the root, and remember which nodes you've visited avoiding nodes that were already visited by any other previous iteration
  • nodes for which the traversal back to root can't succeed because the previous nodes were already used, must be merged/discarded.

However the result is not "stable" and depends on the order of how nodes were sorted, and how they picked the previous node to use.

But it seems to me that if every time when taking a previous node while having a choice of some other previous nodes, we note the unused previous nodes on the used previous node and allow subsequent "traverse the path back to root" to use the previous nodes that weren't taken like they were previous nodes of the currently consider node itself ... things might work OK? It seems to fix the problem where traversal uses a path that is needed for another traversal, instead of a path that no one else uses.

It's late and I'm probably rumbling, but maybe myself I'll be able to decipher it, once I get back to it.

dpc avatar May 25 '20 06:05 dpc

https://arxiv.org/abs/2203.00671

dpc avatar Apr 24 '22 23:04 dpc

https://www.youtube.com/watch?v=KsMtVthpkzI

dpc avatar Jun 09 '22 03:06 dpc