estuary icon indicating copy to clipboard operation
estuary copied to clipboard

Auto-gen datadir for shuttles correlating to their handle

Open neelvirdy opened this issue 2 years ago • 7 comments

Instead of requiring node operators to manually manage datadirs for each shuttle, we can auto-gen the folder as pinQueue-<HANDLE>. This also communicates that the datadir needs to stay associated to the right shuttle, since estuary tracks data location by shuttle handle in DB. In other words, it's less likely someone will create pinQueue-shuttle-1/ and try to reuse it on a different shuttle and enter an unpredictable state.

cc @snissn @en0ma

neelvirdy avatar Dec 08 '22 16:12 neelvirdy

@neelvirdy I do not think that approach is a good idea. The ideal way would be either;

  • Since Estuary and shuttles are supposed to run in different boxes, in the document where we explain how to run a shuttle, we should communicate that running a shuttle in the same box as Estuary requires specifying a different data-dir.
  • Or another common approach used in IPFS land, would be to have both Estuary and Shuttle have their own repo that stores artifacts like config and data

en0ma avatar Dec 08 '22 16:12 en0ma

Clarifying question: even if a shuttle is running on its own box, its handle needs to stay consistent with the datadir it's using right? If it has something in its datadir and is restarted with a different handle but same datadir, doesn't that cause issues with where the DB thinks the data lives (the old handle)? Wouldn't it be safer to tie the datadir to the handle so that if someone switched the handle their shuttle node was using, it'd make a new datadir explicitly labeled with that new handle so they could realize and restore where things should be by later matching the datadirs to their correct handles?

I don't think this is a big deal to be clear, but ideally node operators dont even have to know this concept exists and they just run ./estuary and ./estuary-shuttle with a minimum set of flags, and are guaranteed to stay healthy and/or recoverable to healthy operation.

neelvirdy avatar Dec 08 '22 17:12 neelvirdy

even if a shuttle is running on its own box, its handle needs to stay consistent with the datadir it's using right?

no

If it has something in its datadir and is restarted with a different handle but same datadir, doesn't that cause issues with where the DB thinks the data lives (the old handle)?

handles are shuttle identity, it it uses it for communication and auth. The peer key and other artefacts like blocks reside in the datadir (repo).

A shuttle can re-init and get another handle, but the data would remain with no changes

en0ma avatar Dec 08 '22 17:12 en0ma

Right, the data would remain but the handle would have changed. Which means if we needed to ask a shuttle for data - our DB would have the wrong handle for where the data is living now, no?

neelvirdy avatar Dec 08 '22 17:12 neelvirdy

yeah, if a shuttle is to change its handle (which is not supported) it will have to update all contents location, else it won't even be able to associate API contents to it. It's about API, not the shuttles, API shuttle content will be orphans if a shuttle changes its handles without reclaiming it contents on the API side

en0ma avatar Dec 08 '22 17:12 en0ma

gotcha. my goal with this is:

  1. to make it harder for someone to corrupt or mix up their datadirs by reusing them on different shuttle handles accidentally, or switching datadirs without switching handles. basically avoiding operators having too much unnecessary flexibility that just adds risk
  2. if you did end up mixing up your shuttle handles and datadirs, this FR would mean at least your datadirs are by default tied to the handle so you have a very simple remediation path of just moving the datadirs into the right place and restarting your shuttles with the matching handle
  3. to not require operators to even think about datadirs

neelvirdy avatar Dec 08 '22 17:12 neelvirdy

to make it harder for someone to corrupt or mix up their datadirs by reusing them on different shuttle handles accidentally

You can't mix up or corrupt your data simply by changing the handle on a shuttle. Mixup can most likely happen only when you run multiple shuttles on the same box with the same datadir (which you cant the last time I tried).

Or if you run a shuttle and an API node on the same box (for the most part, without specifying datadir for each, the most important artifacts are namespace with suffix, but can clash for things like pinmgr that did not name space). My point is, scoping by handles will not solve the issue and is not the ideal solution.

if you did end up mixing up your shuttle handles and datadirs, this FR would mean at least your datadirs are by default tied to the handle so you have a very simple remediation path of just moving the datadirs into the right place and restarting your shuttles with the matching handle

Operators should read the runbook for both. Also, handles have no association with node artifacts.

to not require operators to even think about datadirs

We should use a repo approach as used in IPFS land.

en0ma avatar Dec 08 '22 17:12 en0ma