zoekt icon indicating copy to clipboard operation
zoekt copied to clipboard

indexserver: add debug endpoint for deleting repository shards

Open ggilmore opened this issue 2 years ago • 6 comments

This PR adds a new debug endpoint /debug/delete?id=$REPO_ID. When a user calls this URL, all the shards associated with this ID will be deleted (or tombstoned if they're in compound shards).

Given the frequent memory-map incidents that we had this this week, this PR should make it easier for users to recover when they're in this state.

They should be able:

  1. Grab a list of all IDs / repositories that are indexed on the instance:
    wget -q -O - http://localhost:6072/debug/indexed                                                                                                                                              (base) 14:51:36
    ID              Name
    736075603       wut
    1309754292      whatever
    
  2. Extract some subset of repo ids via some method like awk | tail/head
    wget -q -O - http://localhost:6072/debug/indexed | awk '{print $1}' | tail -n +2 | head -n 2                                                                                                  (base) 15:01:56
    736075603
    1309754292
    
  3. Pipe their chosen ids to this new debug route:
    wget -q -O - http://localhost:6072/debug/indexed | awk '{print $1}' | tail -n +2 | head -n 2 | xargs -I {} -- wget -q -O -  "http://localhost:6072/debug/delete?id={}"
    

This should also be useful for customers that want to try incremental indexing (and need a faster way to revert other than waiting for a new build).


(notes)

  • This PR still isn't final, I still need to add basic tests for deleteShards.

  • I decided to put this logic entirely in main.go. This logic is quite similar to the logic used in build/builder.go, but

    • I didn't want to export this as a public function, I can't see anyone else needing to re-use this shard deletion logic (this debug endpoint should be the only user)
    • Builder has its own custom logic around shard deletion that's battle-tested (deleting shards only when it successfully finishes a new build).
  • I think the use of build.Options here isn't great. That object is so huge, and we ultimately only need the indexDir, repoID, and repoName out of it - everything else is irrelevant for this use case. I tried experimenting with refactoring some of those interfaces, but I ended up just creating a bunch of pass-through-methods instead (from https://www.amazon.com/Philosophy-Software-Design-John-Ousterhout/dp/1732102201). What are your thoughts on this? Is it it worth the refactor? Can we create a smaller "options" object that doesn't have so many extraneous fields?

ggilmore avatar Nov 17 '22 23:11 ggilmore

I think the use of build.Options here isn't great. That object is so huge, and we ultimately only need the indexDir, repoID, and repoName out of it - everything else is irrelevant for this use case. I tried experimenting with refactoring some of those interfaces, but I ended up just creating a bunch of pass-through-methods instead (from https://www.amazon.com/Philosophy-Software-Design-John-Ousterhout/dp/1732102201). What are your thoughts on this? Is it it worth the refactor? Can we create a smaller "options" object that doesn't have so many extraneous fields?

I would leave it as it. It is very "Zoekt" to pass around options structs. I believe you could do with IndexOptions but you need FindAllShards() which doesn't really have to implemented on top of build.Options and should maybe be a function instead.

However, I think at some point, Zoekt would profit from a unified concept of a shard IE operations (find(id), delete(id), ...) to abstract from compound shards, simple shards, and delta shards.

stefanhengl avatar Nov 18 '22 10:11 stefanhengl

You might as well add your nice recipe "How to delete a repo" from the PR description to the documentation of the debug endpoint.

stefanhengl avatar Nov 18 '22 11:11 stefanhengl

@sourcegraph/search-core PTAL

ggilmore avatar Nov 19 '22 01:11 ggilmore

@sourcegraph/search-core PTAL

I have:

  1. moved the logic from main.go into cleanup.go
  2. changed the deletion logic to use zoekt.FilePaths instead of stat-ing directly
  3. added a new sub-routine to cleanup() that ensures that we delete any metadata files that don't have corresponding shards (this is necessary after the change introduced in "2" since there is no longer an enforced order for how we delete files)
  4. changed the logic for deleteShards to scan the entire indexDir instead of using build.Options.FindAllShards() (following the style of other functions in cleanup)
  5. changed the debugHandler to take out a global indexDir lock
  6. wired up the shardLogger to the deleteShards implementation

ggilmore avatar Nov 29 '22 20:11 ggilmore

@stefanhengl PTAL at my latest commit which tries to factor out the logic.

To be honest, I don't like the logic of this shared interface much at all (indexDir and shards are passed (and there is no contract that indexDir needs to contain those shards, indexDir is only passed for logging purposes, etc). LMK what you think.

ggilmore avatar Nov 30 '22 23:11 ggilmore

What is the status of this PR? What can I do to help unblock/land?

keegancsmith avatar Jan 09 '23 07:01 keegancsmith