electricsheep-hd-client icon indicating copy to clipboard operation
electricsheep-hd-client copied to clipboard

Distributed longterm archive

Open kochd opened this issue 5 years ago • 15 comments

Concept of getting rid of the need for large server side storage

  • [x] Clients archive all sequences anyway
  • [x] Clients announce their stored sequences with their matching SHA2 hashes to the server
  • [x] Server aggregates and sanitizes the known material across all the clients
  • [ ] Based on the size of the network and the accumulated trust in the announcers an announced sequence is treated as valid or invalid
  • [ ] Miss behaving clients loose trust
  • [ ] Clients can ask the server for archived material
  • [ ] When a client requests a sequence that is not stored on the server itself the server will collect it from the distributed storage. The server gathers it from the announcers and hands it over to the requester
  • [ ] The server does not need to store the sequences for a long time
  • [ ] The server just keeps an cache of the last X requested sequences
  • [ ] Bonus security to just hashing: When a new sequence is pushed from a render node to the server it gets signed by the server preventing later data manipulation on client side.

Cons:

  • Network traffic on server side doubles in the worst case (at 0% cache hit)
  • Adds a ton of complexity on client-side compared to the current state

Pros:

  • Way larger archive volume
  • 0 payment
  • No NAT or firewall configuration on the clients required

Feel free to comment

kochd avatar Sep 16 '19 18:09 kochd

  • Do you mean to implement P2P bittorent to reduce server trafic? Server hold tracker prefer to send .torrent based by low S/L first, (new users will start at different position on archive) then by request n+1/n-1. Server catch L<limit or by request. Or server will prefer send from last sheets, ten n+1/n-1, and realy old sheets will be lost in the end? (server/client by p2p can collect requested data later, when L get one day back online)

  • New users will request very old sheeps not stored on server every day? Or you will profit from p2p? Do you prefer download sheeps from start/end first?

  • Add option to buy credit/karma if you do not want to mine new sheets or share sheets/internet (upload is slow, add upload.cfg speed limit)?

  • Get some free credit for new users to get some "n" first sheets?

StanekO avatar Sep 17 '19 00:09 StanekO

* Do you mean to implement P2P bittorent to reduce server trafic? Server hold tracker prefer to send .torrent based by low S/**L** first, (new users will start at different position on archive) then by request n+1/n-1. Server catch L<limit or by request. Or server will prefer send from last sheets, ten n+1/n-1, and realy old sheets will be lost in the end? (server/client by p2p can collect requested data later, when L get one day back online)

Exactly. But because of storage not traffic. The server would tunnel the traffic between the clients thus increasing traffic by reducing storage. Full blown bittorrent would be overkill i think. It will be kinda like bittorrent but not actually THE bittorrent protocol. I'll have to look in bittorrent to see how hard it would be to implement the basic tasks required. Good point: The server should serve the clients from the lowest gapless point.

* New users will request very old sheeps not stored on server every day? Or you will profit from p2p? Do you prefer download sheeps from start/end first?

Currently client pulls the oldest sequence it is missing from the archive. When going with P2P when change directions: The server will first serve from HEAD (like the HEAD of a snake: newest sequence) backwards:

|---Not stored by the server---------------------|

After that P2P seeds to the oldest point in history that can continue gaplessly to the newest

|-Skipped due ot gaps--|--Served by P2P----------|
* Add option to buy credit/karma if you do not want to mine new sheets or share sheets/internet (upload is slow, add upload.cfg speed limit)?

By "buy" you mean buying with money? Im not going down that road and i don't want to mess around with finances. Someone always have to give back what he takes from the network. Want sequences?: Give frames, peer upload, etc

* Get some free credit for new users to get some "n" first sheets?

I'll think about that. It would be a jump start. You would have a preview of what this about right away instead of having to wait for your first sequences to arrive. Maybe enough to pull 3 sequences for new registered accounts

kochd avatar Sep 17 '19 09:09 kochd

I'll think about that. It would be a jump start. You would have a preview of what this about right away instead of having to wait for your first sequences to arrive. Maybe enough to pull 3 sequences for new registered accounts

At the srart, you are not trust men and you are waiting and waiting, 3 as gift will be ok.

Update:

|-Skipped due to gaps-|--------------n/a--------------|<----Served by server P2P----|
|-------Skipped due to gaps-------|-------------Served by user#1 P2P----------------|
|-Skipped due to gaps-|-------------Served by user#2 P2P------------|------n/a------|
[1] [3] [5] [7] [9] [11] [13] [14] [15] [16] [17] [18] <[19]>>> [20] [21] [22] [HEAD]

Now i have a lot of credit, but is hard for me to catch a [HEAD]. I am [244.00269], HEAD=[244.00300]. Maybe will be better to start from [19]->>>[HEAD] first. And then, if new sheeps are not available, ask for older [1]<-[19]. Who will "care" about [1], if you have been playing them 100 times, you want the new one :-)

PS. now I am getting error: disk full 1/s. Maybe request on error can by done in random time 0-60. PS: Yes, sorry, my mistake, i did not notice min, it is OK :-)

[13:34:28] WARN: Server response 908: Server out of disk space. It will cleanup hourly.
[13:34:28] ERROR: Error in season: Server out of disk space. It will cleanup hourly.
[13:35:29] WARN: Server response 908: Server out of disk space. It will cleanup hourly.
[13:35:29] ERROR: Error in season: Server out of disk space. It will cleanup hourly.

StanekO avatar Sep 17 '19 11:09 StanekO

PS. now I am getting error: disk full 1/s. Maybe request on error can by done in random time 0-60.

Strange... it should sleep for 60s: https://github.com/kochd/electricsheep-hd-client/blob/master/daemon#L351

My daemon sleeps

kochd avatar Sep 17 '19 11:09 kochd

I agree, we should better do

if (MY_HEAD+1).exists?
  get(MY_HEAD+1)
elsif (MY_TAIL-1).exists?
 get(MY_TAIL-1)
end

prefer forward over backward syncing

kochd avatar Sep 17 '19 15:09 kochd

this sounds all very promising. keep up the good work!

earsneyes avatar Sep 18 '19 18:09 earsneyes

Longterm archive replaced by dynamic voting/chain:

sheeps: [1] [1to2] [2] [2to3] [3] [3to4] [4] [4to5] [5] [5to6] [6]
voting: (1=0)      (2=1)      (3=-3)     (4=-2)     (5=2)      (6=-1)

(new [7]) -> (delete worst [2to3] [3] [3to4]) + (new task [2to4]) New better sequence:

sheeps: [1] [1to2] [2] [2to4] [4] [4to5] [5] [5to6] [6] [6to7] [7]
voting: (1=0)      (2=1)      (4=-2)     (5=2)      (6=-1)     (7=0)

In the end, you will save space and get "better" chain of sheeps? And bad or flam3 incompatible sheeps will be deleted...

StanekO avatar Oct 06 '19 23:10 StanekO

I like this idea, but I also wonder about a distributed server network. Keeping clients somewhat "dumb" if you will and maybe sourcing help from the community to host server side processes.

I don't see the server side implementation here and am not sure if you're keeping that under wraps but it's possible depending on the environment I could offer up resources here in the US for this project.

Just a thought.

hdusten avatar Jan 06 '20 14:01 hdusten

The server source isn't open sourced due to security considerations. Keeping the users and the network safe from vandalism, by not publicly revealing the mechanisms used. I would release it or at least hand it over if i am ever going to abandon this project. Someone could upload inappropriate images instead of actual frames...

kochd avatar Jan 06 '20 16:01 kochd

Yeah I had drawn the correlates. That's something I would do myself and respected.

hdusten avatar Jan 07 '20 03:01 hdusten

Noticed we're down to 12gb with no new work this morning. Are we tapped out in the server? I wish I was a Ruby developer and not a veteran .Net developer because I would help build this distributed network lol.

What would it take to get more space on the server now until this idea becomes a reality?

hdusten avatar Jan 11 '20 15:01 hdusten

I am working on things but I have nothing that could fix this quick and dirty... Everything would require more work and sadly I am busy doing other things currently.

Also: The *.webm are about 1/10 of the disk space:

1,3G electricsheep.244.01436.13103_electricsheep.244.01437.06193/*.jpg
144M electricsheep.244.01436.13103_electricsheep.244.01437.06193.webm

Those jpgs are way heavier. So: We need to get faster VP9 rendering cuz we can throw the jpegs into the bin once the VP9 is uploaded back to the server. That's what i will put more time into soon.

kochd avatar Jan 12 '20 23:01 kochd

  1. any updates?
  2. how much space everything takes now anyways?

EsEnZeT avatar Sep 30 '22 21:09 EsEnZeT

The oldest rendered sequence the server holds right now was rendered Jan 26 2022. This was archived by other optimizations and should be plenty for now. This feature wont do much at the moment.

kochd avatar Oct 06 '22 11:10 kochd

The oldest rendered sequence the server holds right now was rendered Jan 26 2022.

What about older generated sequences - is it possible to download them atm?

EsEnZeT avatar Oct 07 '22 00:10 EsEnZeT