hypercloud icon indicating copy to clipboard operation
hypercloud copied to clipboard

Replication may be pushing too many feeds into the connection

Open pfrazee opened this issue 7 years ago • 7 comments

If you look in hypercore-archiver, the replication code adds all stored feeds to the connection. (Its current usage, in archiver-server, does not set passive to false.)

I'm guessing this means that hypercloud will, at minimum, announce all currently stored archives at the time of connect. That can't scale. Shouldn't the hypercloud sit and wait for requests, passively?

pfrazee avatar Jan 04 '17 02:01 pfrazee

@maxogden I think this might relate to your remarks earlier about the archiver-server being passive. The current archiver-bot does set passive to true when it replicates. There's definitely a scaling issue there.

But, if passive is true, then the public peer wont ask other public peers for anything. I'm I'm understanding this correctly, we'll need some kind of middle-ground; an algorithm for asking for updates with proper throttling.

pfrazee avatar Jan 04 '17 02:01 pfrazee

Clarifying question on that code (hard for me to understand due to vague method/variable names), is this the line that 'adds' a feed to a connection? https://github.com/mafintosh/hypercore-archiver/blob/dd34d62253d56604c94d8785e5e39b83816fb30f/index.js#L194 So the issue is the archiver will call .replicate many times over one connection?

Why is it doing that in the first place? Can't we just only call .replicate() for the hypercore that the connection is asking for?

max-mapper avatar Jan 04 '17 02:01 max-mapper

Why is it doing that in the first place? Can't we just only call .replicate() for the hypercore that the connection is asking for?

As I understand it, you need to call feed.replicate() for every feed you want to sync.

I believe the issue is, that we only have two modes: 1) ask to sync every feed we have stored locally, or 2) don't ask to sync anything and let the peer make the feed.replicate() calls.

The latter is passive-mode. If two passive-mode peers connect, no transfer will occur. That's the problem you remarked on, earlier.

However, non-passive-mode will have a scaling problem at some point. You'll ask to sync too many feeds for the connection.

pfrazee avatar Jan 04 '17 02:01 pfrazee

What if we just used 1 connection per .replicate()?

max-mapper avatar Jan 04 '17 03:01 max-mapper

No that wouldn't solve the problem. Basically the problem is that hyperclouds are interested in too many hypercores. A peer will show up and the hypercore will ask "you have anything new for 10mm cores?" Too thirsty.

We do want the hypercloud to ask about some of their cores. Just not all of them, every time.

pfrazee avatar Jan 04 '17 03:01 pfrazee

I'm guessing this means that hypercloud will, at minimum, announce all currently stored archives at the time of connect. That can't scale. Shouldn't the hypercloud sit and wait for requests, passively?

Important to note that announcing is separate from opening the feed. In archiver-server, there is a random timeout to avoid flooding all those announcements but still likely a problem.

But both are issues: 1) have many feeds open and 2) announcing too many things at once

pfrazee: jhand: to clarify, there's two places where a flood could happen. The one you linked to is announcing on the discovery network. The other one, which max and I are discussing, is announcing feeds once a connection is established between peers

Ah!

joehand avatar Jan 04 '17 03:01 joehand

(Max and I clarified our points in IRC)

pfrazee avatar Jan 04 '17 03:01 pfrazee