webdb icon indicating copy to clipboard operation
webdb copied to clipboard

NotFoundError when an indexed file is removed

Open rh0 opened this issue 7 years ago • 5 comments

After WebDB has indexed a file, if that file is then removed (outside of webdb), a NotFoundError is thrown. The record for that file seems to remain in the database.

This may be a "works as designed" situation, as I may not be thinking of all use cases. However, it seems like it would be a good idea to handle this error and unindex the file, when that indexed file is removed from the archive. This would probably be a catch (maybe emitting a 'index-file-missing' event or something) on archive.download() in the following line: archive.fileEvents.addEventListener('invalidated', ({path}) => archive.download(path)) https://github.com/beakerbrowser/webdb/blob/master/lib/indexer.js#L48

I'm happy to submit a PR, but thought I'd open an issue to be sure I'm not missing something.

rh0 avatar Feb 18 '18 17:02 rh0

Hmm! A file deletion is intended to be interpreted as a deletion of the record and the data should be unindexed. See this code. Maybe this is a bug?

pfrazee avatar Feb 18 '18 18:02 pfrazee

So it looks like deletion is indeed happening. When I initially looked there was apparently some lingering cruft in the database. Cleared the site data, and things have been running smoothly since. Further, I added some debug output to the locaion you linked, ran some tests, and verified that unindexFile does fire and the record is deleted. However, the NotFoundError error is still thrown on this line. I'll continue testing to try and understand what's happening here.

rh0 avatar Feb 20 '18 01:02 rh0

Ok sounds like two bugs:

  1. Indexed data got out of sync with the source data
  2. Calling .download() on invalidate is triggering an error when the file is deleted

Not sure if we should bother fixing 2, or just suppress the error, but 1 is something I've noticed myself when there's a connectivity issue. I think the issue is that WebDB doesn't properly account for download failures. I'm not sure why deleting data wouldn't get handled, though, so I'll need to poke at it a bit.

pfrazee avatar Feb 20 '18 16:02 pfrazee

So, at least in my case, I think the index got out of sync as I was developing the source archive. Data changed a little bit, and I did a lot of starting and stopping of the archive. I have a feeling that's where things diverged. After clearing the db in beaker two days ago I've yet to see lingering records re-emerge. I'll keep my eye out to see if I can catch it happening, if it ever happens again.

I've looked into issue 2 with a variety of setups, and it seems to happen every time. The error does not seem to hinder any function, and is mostly an annoyance. Would it be best to catch the error right there at the invalidate .download() call? If so I can take a stab at a PR.

rh0 avatar Feb 20 '18 17:02 rh0

Thanks but I'll write some tests and figure out how to handle download failures first

pfrazee avatar Feb 20 '18 17:02 pfrazee