dat-node icon indicating copy to clipboard operation
dat-node copied to clipboard

question: how to build a dat-powered P2P Wikipedia?

Open derhuerst opened this issue 6 years ago • 3 comments

Hey!

I'm working on bringing Wikipedia into dat. For that, I have written two tools that work on top of hyperdrive:

  • build-wikipedia-feed's store-revisions fetches a revision of an article and writes it into the hyperdrive, with a custom mtime.
  • wikipedia-feed-ui takes a hyperdrive and serves an index of articles, an article itself and the history of an article (using archive.history) over HTTP.

I want to build a "full node" that downloads all of wikipedia, feeds it into a hyperdrive/dat archive and seeds it over the network. The client/"light node" will be able to access Wikipedia in three ways:

  • download on demand: retrieve a revision of an article when the user wants to access it locally
  • sync the current state: retrieve the latest revision of all articles
  • full sync: retrieve all revisions of all articles

My question is a rather general one: Which way to you recommend to build the client/"light node"? I think it is crucial to have a good UX on this, as the average Wikipedia user won't be tech-savvy. It should be one-click-installable solution with little moving parts. I can see multiple solutions:

  • Require users to install both dat and dat-wiki.
    • dat-wiki would tell dat to keep the relevant archive synced.
    • Is it possible to query the syncing status, history, etc of an archive that someone other process synced? It may interfere with other processes, right?
  • Build dat-wiki as a wrapper around dat-node.
    • Would it be possible to use the low-level archive.history and archive.writeFile (with mtime) then?
    • This would sacrifice all the tooling around dat, e.g. dat-cli.
  • another, smarter way?

derhuerst avatar Jul 31 '17 15:07 derhuerst

Awesome! We've wanted this to exist for awhile =).

The two concepts that'll be important to use in Dat are the latest and sparse modes. By default, the CLI is always in latest mode:

  • latest: keep only the most recent copy of any given file. If page.html updates to version 2 then hyperdrive will download version 2 and delete page.html version 1.
  • sparse: only download the files specifically requested. This can be combined with latest too.

I think it is crucial to have a good UX on this, as the average Wikipedia user won't be tech-savvy.

Ya, I'd agree! This should be the first question to answer. Once the UI is decided on, putting together the pieces underneath will be the easier part. An good example you may look at is Science Fair. The underlying data is two dat archives, one for metadata and one for the actual articles. Both of these can be viewed/downloaded via CLI. But the way data is presented in the app makes it much easier to use and manage.

So a few points on compatibility:

  • All hyperdrives can be shared across any implementation. I don't need to know how you created an archive before deciding how I want to download it. Its all just files =).
  • Any application built on dat-node will be compatible with the CLI and Dat Desktop (both use dat-node underneath). Almost all of the tooling in dat-cli is inside dat-node. The CLI module is mostly the UI bits (though that is often the harder part =)).

Is it possible to query the syncing status, history, etc of an archive that someone other process synced? It may interfere with other processes, right?

As long as you are downloading files (not writing), several processes can share the same unerlying archive. The syncing status, etc. is all built into the metadata.

Would it be possible to use the low-level archive.history and archive.writeFile (with mtime) then?

Yes, dat-node has access to all the archive APIs via dat.archive (see issue #163 for one caveat here).

This sounds really cool and definitely a project we've want to see work! Let us know if you have any more questions.

ps. Have you seen the Beaker Browser, and specifically the dat API may be a good way to prototype a UI for this. Additionally, if you can make a wikipedia dat that is browsable as regular webpages, you can access it like any other website in Beaker.

joehand avatar Jul 31 '17 16:07 joehand

Beaker would be really nice for this indeed!

ralphtheninja avatar Jul 31 '17 17:07 ralphtheninja

Yes, dat-node has access to all the archive APIs via dat.archive (see issue #xx for one caveat here).

I assume you mean #163 ?

derhuerst avatar Aug 07 '17 12:08 derhuerst