datproject-discussions icon indicating copy to clipboard operation
datproject-discussions copied to clipboard

Doesn't DAT Need Official HTTPS Gateway?

Open ilyaigpetrov opened this issue 6 years ago • 39 comments

IPFS has a https-gateway used as https://ipfs.io/ipfs/\<hash\>. The existence of such gateway gives the following benefits:

  1. Search engine crawlers may access these pages.
  2. Users may access ipfs/dat pages without ipfs installed.
  3. If access to the pages is blocked by censors then:
    1. pages should still appear in search results.
    2. users may still install ipfs and ipfs-companion browser extension to access these pages.

@RangerMauve has implemented a http(s) gateway based on pfrazee/dat-gateway working in a way similar to ipfs (repo, demo). It even redirects each page to a subdomain to insulate cookies and provide better security.

Concerning datbase.org — I couldn't get it to serve me a html page the right way (without the header and maybe with absolute css paths which is not critical).

I kindly ask the DAT team to take a view on the work @RangerMauve and similar works and provide users with an official DAT https gateway, which we may use to build our future censorship-resistant websites.

ilyaigpetrov avatar May 16 '18 13:05 ilyaigpetrov

One thing I'd also like to note is that I'm working on having dat-gateway automatically inject a DatArchive polyfill so that sites that make use of it can work without any extensions.

RangerMauve avatar May 16 '18 14:05 RangerMauve

Hey! Been really great to see the work @RangerMauve and others are doing on gateway and related work. You are right on the benefits of having the gateway. While an official gateway can be helpful it could also lead to copyright & legal issues beyond the resources of our nonprofit. We'd like to keep the scope of the Dat Project focused on the core technology to ensure sustainability in the long-term, and using resources otherwise may detract from that.

It's really great to see the community efforts around this and we'll continue to support them however we can.

Concerning datbase.org...

Dat Base was intended a registry not necessarily a gateway and I'm not sure we could cover all uses without doing subdomains.

joehand avatar May 16 '18 15:05 joehand

Hi, I'd like to hijack this issue to talk about gateways and the such.

There was a meeting with some people from the Dat community to talk about gateways and getting Dat working in the browser on Wednesday Feb 27. Here are the meeting notes (courtesy of @substack)

# dat in browsers notes

2019-02-27

# on the call

* diana: working on dat gateway
* franz: working on archipel
* mauve: working on webrtc/websockets/discovery-swarm-stream
* substack: working on peermaps

# franz: working on archipel

archipel is an electron app using an rpc layer over websockets

hyperstack module: hyperdrive and hypergraph, but probably extensible

same apis for using dat apis on the browser or the server

goal: allow the same backend code to run in a webworker in the browser

# webrtc/gateway/signaling

mauve: got webrtc working on signalhub

prioritize webrtc connections by setting a delay to start using websockets
establishes a websocket connection too after some seconds

signalhub is great for a single swarm
swarm per key in a browser is very limiting

discovery-swarm-stream: rpc over websockets for the discovery-swarm api
mux several streams through websockets

with dat gateway,
dat key gets leaked to the gateway: privacy issue

applications shouldn't necessarily share data with 3rd party gateways

does discovery-swarm-stream make sense?
act as a signaling server over the same protocol?

franz: many use-cases for browser/server storage/processing splits
easy to use rpc api for easier frontend apps
browser vs server has different trade-offs for storage/network etc

substack: peermaps is browser-first p2p for maps
using iframes to handle cross-domain maps?

mauve: gozala is working on lunet using service workers paired with iframes
to serve ipfs or will proxy to local ipfs daemon

substack: how do custom extensions work with gateway/signaling/rpc work?

mauve: gateway probably wouldn't work with custom extensions, but custom
extensions over discovery-swarm-stream should work.

franz: with hyperstack, the backend has the different implementations
how to make it run in the browser? looking at lunet, different replication
possibilities for gateways etc

mauve: do we really need shared storage across origins? what about iframe post
messages

if you have different apps with different data, do you need to necessarily
coordinate on different origins? seems complicated

substack: running tools like lunet makes sense for seeding from the browser and
managing connections from a place/single interface

We also spoke about this on Thursday the 28th at the dat comm-comm call.

The gist of it is that it'd be nice to find a standard way of doing dat stuff in the browser.

The main pieces that I see (feel free to add more) are:

  • A standard and efficient way of working with webrtc-swarm (Something similar to signalhubws??)
  • A standard gateway protocol for reaching out to the Dat network (Maybe discovery-swarm-stream)
  • An standard HTTP interface for dealing with hyper-* data structures for when you don't want to create them client-side (@frando could elaborate)
  • A way to share resources like connections to the swarms / gateways and storage between origins. (Maybe something based on gozala/lunet)

I looked at some this stuff a while ago when I was working on dat-polyfill and again recently when working on dat-js.

I propose meeting on Wednesday February the 6th at 20:00 GMT to discuss this stuff and maybe start putting parts together. We could use audio-only setups with https://talky.io/dat-in-browsers to talk about it.

Personally, I'd like to work on combining the signalhub / discovery-swarm-stream code so that we could support replicating multiple hyperdrives through both WebRTC and proxying to the discovery-swarm all with a single websocket connection. (Also integrating hyperswarm once that stabilizes)

Does that time work for you all? Is there a better date or time? Any other items that I could add to the list of stuff to talk about?

CC @garbados @substack @karissa @frando @dpaez @tinchoz49 @gozala

RangerMauve avatar Feb 28 '19 20:02 RangerMauve

Also, I'm making a calendar invite. Email me at [email protected] if you'd like to be added to the calendar.

RangerMauve avatar Feb 28 '19 21:02 RangerMauve

I think a good way of accomplishing this is to publish a set of capabilities to the peer table when joining a swarm. So for example if I am a browser I can communicate as a websocket-client and webrtc-peer. And if I'm an electron app I can communicate as a websocket-client, websocket-server (if I have a public IP or can hole-punch), tcp client, tcp server (if I have a public IP or can hole-punch), udp etc. Peers could also publish their connection preferences to the table. Then to make a peer connection, clients can consult this table along with their own heuristics to make the best connection possible according to some mutually acceptable preferences.

These kind of hybrid swarms would be very useful for merging what would otherwise be fairly separate networks based on transport protocols.

Apologies if this is already the plan, although if so I guess this comment will help to disambiguate.

ghost avatar Mar 01 '19 00:03 ghost

Yeah, that's a great idea! One thing I was thinking of is that it'd be cool if these gateway servers published their existence to the discovery swarm under a know key. Then you could connect to one and discover more through it and potentially save them for later.

RangerMauve avatar Mar 01 '19 03:03 RangerMauve

@pvh o/

okdistribute avatar Mar 01 '19 03:03 okdistribute

Re: the capabilities. We should discuss (on the call) how to do this stuff without reimplementing libp2p. 😅

RangerMauve avatar Mar 01 '19 03:03 RangerMauve

Re: the capabilities. We should discuss (on the call) how to do this stuff without reimplementing libp2p. 😅

Is there reason to not collaborate on libp2p itself?

Reading this thread I was feeling: oh that’s exactly goal of libp2p, which also happens to have rust implementation so you could in theory wasm it.

Gozala avatar Mar 01 '19 05:03 Gozala

I think for the same reason it's healthy to have both KDE and Gnome, or Mozilla and Firefox, or Linux and FreeBSD, it's not wise to create a monoculture.

The dat community and the IPFS community have different ethos, technical goals, funding models, development methodology and values. I think both should inspire and be inspired by each other and drive one another to improve but it doesn't make much sense to me for the dat community to adopt the IPFS codebase.

pvh avatar Mar 01 '19 06:03 pvh

As for web gateways, if you've been following the work Ink & Switch has been doing, we've been discussing something morally along the lines of the DatArchive injection (though I think we envision a quite different actual implementation) to extend our system to non-Electron computers like iPhones and browsers.

Roughly, because a first-order goal for us to is to support totally offline usage we've discussed bridging hypermerge repositories over a websocket gateway but also wrapping all of that magic in a PWA that kept the data in localStorage (or something) for improved durability.

pvh avatar Mar 01 '19 06:03 pvh

@pvh Would you be interested in attending the call?

Also ping @sammacbeth. He's using gateway stuff in https://github.com/cliqz-oss/dat-webext

RangerMauve avatar Mar 01 '19 13:03 RangerMauve

I think for the same reason it's healthy to have both KDE and Gnome, or Mozilla and Firefox, or Linux and FreeBSD, it's not wise to create a monoculture.

The dat community and the IPFS community have different ethos, technical goals, funding models, development methodology and values. I think both should inspire and be inspired by each other and drive one another to improve but it doesn't make much sense to me for the dat community to adopt the IPFS codebase.

I think there are few caveats here that is worth considering:

  • Multiple implementations of the same standard is important (otherwise what's the point of standard) there is no compatibility at any layer here.
  • Adopting libp2p doesn't imply implementing Dat over IPFS. libp2p is a modular networking stack used by community that steches way beyond IPFS. It's actually really well structured allowing one to plug different transport implementation and keep the application layer (like IPFS) the same. That is to say this group can implement own even competing transport which would in fact benefit larger community. e.g. Manyverse got Bluetooth transport that PL is also building for libp2p and down the line this group probably will end up doing it too. If that was a libp2p module instead everyone would gain additional transport rather than reinventing a wheel.
  • If you consider porting to different languages you face the same reinvention of the wheel. Libp2p has Rust implementation both SSB and Dat have ongoing efforts to have one too.
  • If there was one networking stack it would be far more easier to get wider adoption both in browsers and OSs. I can promise you that if there were more interop I would have being far more successful in getting it into browser but now every conversation derails into picking a winner.
  • P2P network will be more resilient the more peers it has, sharing DHT, Routing nodes, clients applications each one will be better off.

Please note that does not imply that:

  • You have to share code base, if you want to compete on alternative but compatible implementation everyone wins!
  • You don't have to buy into ethos, technical goals or funding model. Only thing you need to buy into is an architecture and even there a good argument towards some changes will get you pretty far (in my experience).

I apologize for derailing this conversation, it just saddens me that instead of making greater progress towards decentralization communities across the board choose to keep reinventing the same wheel which slight technical differences. It could be that overhead of coordination across groups would have higher overhead than the value to be gained by that, but that's rarely argument.

Gozala avatar Mar 01 '19 17:03 Gozala

@Gozala you're right, we should discuss this in one of the many other channels we share :) @RangerMauve i'd be interested in joining the conversation, though mostly to listen since we haven't done too much here yet.

pvh avatar Mar 01 '19 17:03 pvh

@pvh Cool, feel free to join in, and send me an email if you'd like to be added to the calendar event.

RangerMauve avatar Mar 01 '19 18:03 RangerMauve

@RangerMauve i'd be interested in joining too. i'll mostly listen in and maybe fold in ideas that come to mind as the call progresses. I sent you an email :^)

cblgh avatar Mar 02 '19 10:03 cblgh

@gozala I've discussed this elsewhere but I think this:

The dat community and the IPFS community have different [...] development methodology and values

is a huge reason why there isn't more interop. I look at something like this code example and I see a wall of configuration that is written in an unfamiliar style and appears to have no practical purpose. It sets up a huge amount of boilerplate and then... you have a Node object? It doesn't explain what anything is for. I mostly see walls of text, tables, badges, org charts, and nothing means anything to me.

Compare this to something like webrtc-swarm. You set up the module with 2 pieces of information and then you can listen for 'peer' events which give you a bidirectional stream. The module doesn't overload you with a manifesto first, it gets out of your way. I can easily see whether a module like webrtc-swarm will solve my problem or not and it doesn't try to solve all the world's problems.

The other development methodology for api design, technical communication, and setting scope leaves me unmotivated to even figure out if the given module will be suitable for what I'm trying to do. I also have no idea what libp2p is doing without reading a book's worth of content, but I can approximately guess how a module like webrtc-swarm works by glancing at its interface. Creating a mental model for the layers that sit below what you're working on is very important to design around the correct set of trade-offs, performance considerations, and failure cases. I also worry with tools that are too configurable about the tendency for those abstractions to leak upward in ways that push against encapsulation.

ghost avatar Mar 03 '19 01:03 ghost

Yeah, I like what libp2p are trying to do, but I don't think this would be the best place to try to integrate it with Dat. I think it'd be better to talk about that somewhere relating to the work in hyperswarm since that's where all the new networking stuff in Dat is going on.

My goal of bringing it up was to figure out a scope that we should focus on and avoid over-engineering.

If down the line there's more adoption of libp2p in the Dat ecosystem, then that will definitely affect the browser, but I'd rather start somewhere small so we can help people experiment with web applications that use Dat.

RangerMauve avatar Mar 03 '19 02:03 RangerMauve

Ping! The call should be starting in a minute or so. :D

RangerMauve avatar Mar 06 '19 19:03 RangerMauve

https://github.com/inkandswitch/discovery-cloud-client

pvh avatar Mar 06 '19 20:03 pvh

Thank you all for coming out to the call! I found it really helpful to learn about your different experiences with this stuff and the use cases that you're aiming for.

Here are the notes I took during the meeting, feel free to add comments on the post for anything I missed:

Participants

  • Diego from Geut, Made a dat blog post using hyperdb for multiwriter
  • cblgh, Alex, working on Cabal, interested in getting Cabal on the web
  • Margin, tinchoz49, also Geut, discovery-swarm-webrtc
  • Gozala, Irakli, working on libdweb, prototypes lunet, IPFS node in the browser using ServiceWorkers / iframes
  • Kaotikus, Scot, Bits.coop, Interested in learning more about the protocol
  • pvh, Peter, automerge project, using Dat and WebRTC for distributed applications, interested in getting non-electron things working
  • substack (no audio)

Notes

  • discovery-swarm-stream is useful

  • discovery-swarm-cloud creates a local swarm on a server

  • random-access-idb is really slow, we need a new approach.

    • There's a proprietary IndexdDB extension in Firefox that could help
    • IPFS doesn't have the same performance issues as dat, it uses IDB for block storage
    • Might be useful to store individual ranges in IDB instead of files
  • webrtc-swarm has issues with reconnecting after going offline for a while, Digeo and martin are looking to do a PR if they find a fix

  • webrtc performance sucks with multiple connections, it's led to pvh giving up on using it,Chrome might be working on improving this internally, @gozala says Mozilla are trying to optimize it

  • WebRTC doesn't work in workers, this is manes processing is done on the main thread which doesn't scale well. Gozala has used hacks for pumping data into workers

  • Latency when putting things into workers is bad, probably a result of IPC latency according to Chrome developers

  • pvh: Dat in browsers is a fallback, ideally focus on Electron apps first

  • Diego: There's a lot of potential and a lot of unknowns in browsers

  • Gozala: We shouldn't compete with different P2P protocols, we should all compete against the web together. Browsers are useful because links are enough to share content

  • pvh: Shouldn't push browser nodes too much because the experience is more complicated and not P2P

  • gozala: We should entice developers to use the tech so that Browsers will eventually support P2P APIs

  • martin: Could this discovery stuff go into hyperswarm? it's hard to track all the modules in the ecosystem

  • cblgh: What is the exact issue with DataChannel performance? Maybe contact feross for details about WebRTC in the browser

  • gozala: It'd be nice to have one app that handles all P2P protocols so that users wouldn't be asked to install it all the time

Here are some action items from the meeting:

  • Get discover-swarm-stream along with a signalhub into a module, get it into dat-js, pvh interested in testing
  • We need a more optimized random-access-idb to improve dat persistence on the web (maybe based on ranges?)
  • Once a module is made for discovery in the network, see if it could be added to the hyperswarm repo so that all of Dat's networking is in one place

I'm going to get started on the the discovery-swarm-stream stuff mid next week with the goal of getting it integrated with dat-js and having someone test it outside of hyperdrive replication.

RangerMauve avatar Mar 06 '19 21:03 RangerMauve

Re: random-access-storage. Would WebSQL perform better than IDB? @pfrazee You're using sqlite for storing dat data, any opinions regarding using it as a backend for hyperdrive?

RangerMauve avatar Mar 06 '19 22:03 RangerMauve

Storage

We experimented with different random-access-* in the browser:

random-access-idb

It was the first idea but we have problems with reading and writing a lot of blocks from hypercore.

random-access-key-value with level-js

It works great at the begin but then starts to slow down when you have >=50 blocks.

import raf from 'random-access-chrome-file'
import randomAccessKeyValue from 'random-access-key-value'
import leveljs from 'level-js'
import levelup from 'levelup'

const db = levelup(leveljs('dbname'))
const storage = file => randomAccessKeyValue(db, file);

random-access-chrome-file

It works really well, we don't find performance issues but don't want to have only support from chrome.

Network

We are using webrtc through discovery-swarm-webrtc and having different issues like unusual disconnections.

We are trying to stabilize the connection :disappointed: and one of the issues that we found is related to signalhubws.

If the ws client loses the connection it doesn't try to reconnect, so we did a fork of signalhubws to use sockette and fix this kind of issues: https://github.com/geut/signalhubws

Probably we are going to do a PR to the original project and discuss the changes there.

tinchoz49 avatar Mar 06 '19 22:03 tinchoz49

@tinchoz49 Appreciate you sharing that research. I'm fairly sure that Chrome is pushing for their files APIs to become standard. It might be a good bet in the long run.

@RangerMauve It's worth taking a look at, to be sure.

pfrazee avatar Mar 06 '19 22:03 pfrazee

@tinchoz49 Have you tried out discovery-swarm-stream yet?

RangerMauve avatar Mar 06 '19 22:03 RangerMauve

The chrome files api is really nice and fast, I also recommend using it despite its dependency on Chrome. I hope it becomes standard! We are using it with a map tile downloader for mapeo.

okdistribute avatar Mar 06 '19 23:03 okdistribute

My (somewhat limited) take on web sql is that it might make sense for metadata lookup, but could be heavy for file storage with lots of blocks.

okdistribute avatar Mar 06 '19 23:03 okdistribute

Ah yeah, I believe we funded random-access-chrome-file. Glad to hear it works in Chrome and doesn't require a Chrome App specific API. :)

pvh avatar Mar 06 '19 23:03 pvh

@tinchoz49 Have you tried out discovery-swarm-stream yet?

It's our first priority for tomorrow.

I'm fairly sure that Chrome is pushing for their files APIs to become standard. It might be a good bet in the long run.

The chrome files api is really nice and fast, I also recommend using it despite its dependency on Chrome. I hope it becomes standard! We are using it with a map tile downloader for mapeo.

That is really interesting thanks for sharing your experience. Right now we are building a demo for the next edcon and we need to have working the browser storage persistence for that day. So, I'm going to talk about using random-access-chrome-file with the team tomorrow.

tinchoz49 avatar Mar 07 '19 00:03 tinchoz49

Some thoughts from my side:

  • I have been using random-access-idb-mutable-file in dat-webext now for a while with no issues. I have not tested performance, but it feels subjectively better than random-access-idb.
  • I used DiscoverySwarmStream as a drop-in replacement for discoverySwarm recently, and it worked great. The only thing I missed was lazy connecting (only connect to the server when a swarm is joined), and re-connecting when the connection drops. There is also an issue with un-connectable peers with this approach. As the server does not listen for connections, and just tries to connect to peers, there are many dats which become inaccessible as peers' ports are not open. I see some movement on this, for example in beaker, but at the moment this is a limitation for DiscoverySwarmStream.
  • WebRTC swarming would be nice to have, though I agree with @Gozala that we don't need to re-invent the wheel, and libp2p may make sense here. The main issue is that configuration should be standardised so that swarms are compatible (for example sharing signaling servers or having a shared federated signalling network).
  • This leads to one of the core issues - the official Dat networking stack is not web-compatible. Until this client can directly communicate with web peers centralisation will be required to bridge web swarms with node ones. Does the current roadmap for Dat-node consider this issue? AFAIK hyperswarm is similarly tied to requiring direct TCP and UDP socket access.

sammacbeth avatar Mar 07 '19 09:03 sammacbeth