nomad icon indicating copy to clipboard operation
nomad copied to clipboard

Improve `ephemeral_disk` documentation

Open ocharles opened this issue 3 years ago • 3 comments

Proposal

Improve the documentation for ephemeral_disk. Currently https://www.nomadproject.io/docs/job-specification/ephemeral_disk is sparse, and doesn't allow me to answer the following questions:

  • migrate explains what happens when sticky is true - but what does it do if it's false?
  • sticky = true makes sense, but again - what does sticky = false mean?
  • There are two references to "best-effort", but I don't think this is precise enough to understand the implications. Under what circumstances will these "best-efforts" fail? What is the alternative? Can I lose data?

Attempted Solutions

As I don't know the answer to these questions, I'm unable to provide a solution

ocharles avatar Oct 07 '21 07:10 ocharles

Hi @ocharles and thanks for raising this. This certainly seems like an area of documentation that can be improved.

jrasell avatar Oct 07 '21 07:10 jrasell

I am new to the nomad-game, but I want to get into using nomad, because I dislike kubernetes (a lot, even tho some good things came out of it).

After reading #5343 I now am not sure what ephemeral_disk is even for. It quotes not to restrict any size-limits, which makes me wonder what the indented purpose was. Especially after reading the migration-part of anything below /alloc/«alloc_id»/data is getting migrated, why is it different from /alloc/«alloc_id»/local? Yes, the stuff in /alloc/«alloc_id»/local probably stays on the node itself, but does that mean allocations distributed on different nodes are sharing /alloc/«alloc_id»/data but not /alloc/«alloc_id»/local?

In general "persistent file storage" is quite a big documentation hole, which would make even more people jump on the nomad-wagon (which in my mind is better than the kubernetes-approach due to less abstractions).

Is there a plan to improve the documentation around this? Links to "tutorials" aren't always that helpful.

FibreFoX avatar Oct 03 '22 16:10 FibreFoX

Echoing the thoughts of @FibreFoX here—

Our containers are stateless and do not write anything to disk. With that in mind, and reading the current documentation as-is, we don't need an ephemeral_disk at all. Unfortunately, omitting the stanza results in a 300mb disk being allocated.

Under most circumstances this would be fine, but I ended up here after debugging some placement issues for one of our jobs. We have a configuration where a node can run 50+ containers of the same image. We ended up getting a placement failure because of disk exhaustion.

To be clear, there could absolutely be something that I'm missing here! Also @jrasell I'd be happy to contribute some doc updates to help clear this up!

Cbeck527 avatar Oct 03 '22 17:10 Cbeck527

I'm also confused after reading the documentation.

I'm looking for a way to deploy stateful apps that can freely travel across the cluster without data loss. The data is small, is not accessed concurrently, and is allowed to die with the node if that happens.

So, ephemeral_disk looks like a perfect fit, right? But then the questions arise. – why are the logs written to the alloc dir? Is the storage "mine" then, or it rather exists for some system purposes – what are the best efforts – can I control the migration failure reaction – why do we need other ways to persist data if such a great thing exists

dangoodman avatar Aug 15 '23 10:08 dangoodman