hyperdrive icon indicating copy to clipboard operation
hyperdrive copied to clipboard

readdir not waiting metadata to be downloaded before firing callback

Open poga opened this issue 7 years ago • 12 comments

drive.readFile() will wait for its metadata to finish downloading before firing callback. drive.readdir does not.

Here's an example

const hyperdrive = require('hyperdrive')
const swarm = require('hyperdiscovery')
const ram = require('random-access-memory')

var a1 = hyperdrive(ram)
var a2
a1.ready(function () {
  swarm(a1)
  a2 = hyperdrive(ram, a1.key)
  a1.writeFile('/foo', 'bar', function () {
    a2.ready(function () {
      swarm(a2)

      // readdir and readFile have different behavior
      a2.readdir('/', function (err, data) {
        console.log('readdir', data) // === []
      })

      a2.readFile('/foo', function (err, data) {
        console.log('readfile', data) // buffer
      })

      // you can see metadata is still being download when readdir callback is fired
      a2.metadata.on('download', (idx, data) => console.log('download', idx, data))
    })
  })
})

Looks like readdir() and other similar methods need to wait for _ensureContent(), just like createReadStream()?.

poga avatar Apr 27 '17 00:04 poga

@mafintosh mentioned that he's been using a trick where you use archive.metadata.update({ifAvailable: true}, () => archive.readdir()) so that it tries to download some available blocks before reading.

However this doesn't work unless you have a peer connected.

At the moment I've been experimenting with doing something like the following:

  • use readdir('/')
  • If it isn't empty, we're good to go
  • else, wait for a peer to join, and use archive.metadata.update
  • have a timeout to account for peers never joining

It's pretty messed up tbh. 😅 Gonna keep iterating to see if I can simplify it.

RangerMauve avatar Jul 24 '19 15:07 RangerMauve

Actually, this code seems to be working okay-ish

const someArchive = Hyperdrive(SOME_URL)

reallyReady(someArchive, () => {
  someArchive.readdir('/', console.log)
})

function reallyReady (archive, cb) {
  let wasReady = false
  archive.metadata.once('sync', tryReady)
  archive.readdir('/', function (e, d) {
    if (e) return
    if(!d.length) return
    console.log('Already loaded metadata?')
    wasReady = true
    cb()
  })

  function tryReady () {
    if (wasReady) return
    console.log('Got an append event so it must be loaded')
    wasReady = true
    cb()
  }
}

RangerMauve avatar Jul 24 '19 15:07 RangerMauve

This is something that interests me as well! Glad you're tracking it @RangerMauve

pfrazee avatar Jul 24 '19 15:07 pfrazee

Yeah, it's been a pain point for @serapath 's work with the SDK so I'm looking to see how to alleviate it. :)

RangerMauve avatar Jul 24 '19 15:07 RangerMauve

waiting for append instead of sync is prob a lot faster. The optimal flow using the update method though as that makes it update outside the first load, ie you’ll always get the latest update.

If you want it to wait forever, I’d suggest to hook up the peer-add event and retry with ifAvailable then

mafintosh avatar Jul 24 '19 16:07 mafintosh

K, check this out:

const someArchive = Hyperdrive(SOME_URL)

reallyReady(someArchive, () => {
  someArchive.readdir('/', console.log)
})

function reallyReady (archive, cb) {
  let wasReady = false
  if(archive.metadata.peers.length) {
    archive.metadata.update({ifAvailable: true}, cb)
  } else {
    archive.metadata.once('peer-add', () => {
      archive.metadata.update({ifAvailable: true}, cb)
    })
  }
}

A timeout should be wrapped around the call since that's a bit more opinionated.

Also, this shouldn't be invoked if the application somehow knows it's offline. (Does hyperswarm provide this?)

I'm going to have this in the SDK docs for now, maybe later on we can figure out if this is something we can integrate directly into append-tree or hypertrie.

RangerMauve avatar Jul 24 '19 16:07 RangerMauve

This pattern should be incorporated into the daemon, I think. Exposing enough stuff through the RPC API sounds like it'll be a PITA.

CC @andrewosh

RangerMauve avatar Jul 24 '19 17:07 RangerMauve

The daemon already does ifAvailable updates before returning any calls :)

mafintosh avatar Jul 24 '19 17:07 mafintosh

Perfect! :D

RangerMauve avatar Jul 24 '19 17:07 RangerMauve

And so does hyperdrive 10 in general through the trie btw

mafintosh avatar Jul 24 '19 17:07 mafintosh

How does hypertrie avoid the situation where you don't have any local peers to wait for updates from?

RangerMauve avatar Jul 24 '19 17:07 RangerMauve

It doesn’t, but it always does an ifAvailable update.

You are touching on an interesting point though as there is no perfect to the solution you describe which is why the stack doesn’t magically do it for you except for updating if available.

You have to “play the map”.

What is your requirements? Do you want to “block” until a peer appears? Are you in an offline environment? Do you want to fully sync? Do you want to wait for a bit and then return an old snapshot? It all depends on what you are trying to build.

We can expose primitives and options to help guide you but at the end of the day the stack can’t solve this for everyone so it only does the least opionated thing it can - uodate ifAvailable

mafintosh avatar Jul 24 '19 17:07 mafintosh