nomad
nomad copied to clipboard
Improve `ephemeral_disk` documentation
Proposal
Improve the documentation for ephemeral_disk
. Currently https://www.nomadproject.io/docs/job-specification/ephemeral_disk is sparse, and doesn't allow me to answer the following questions:
-
migrate
explains what happens whensticky
istrue
- but what does it do if it'sfalse
? -
sticky = true
makes sense, but again - what doessticky = false
mean? - There are two references to "best-effort", but I don't think this is precise enough to understand the implications. Under what circumstances will these "best-efforts" fail? What is the alternative? Can I lose data?
Attempted Solutions
As I don't know the answer to these questions, I'm unable to provide a solution
Hi @ocharles and thanks for raising this. This certainly seems like an area of documentation that can be improved.
I am new to the nomad-game, but I want to get into using nomad, because I dislike kubernetes (a lot, even tho some good things came out of it).
After reading #5343 I now am not sure what ephemeral_disk
is even for. It quotes not to restrict any size-limits, which makes me wonder what the indented purpose was. Especially after reading the migration-part of anything below /alloc/«alloc_id»/data
is getting migrated, why is it different from /alloc/«alloc_id»/local
? Yes, the stuff in /alloc/«alloc_id»/local
probably stays on the node itself, but does that mean allocations distributed on different nodes are sharing /alloc/«alloc_id»/data
but not /alloc/«alloc_id»/local
?
In general "persistent file storage" is quite a big documentation hole, which would make even more people jump on the nomad-wagon (which in my mind is better than the kubernetes-approach due to less abstractions).
Is there a plan to improve the documentation around this? Links to "tutorials" aren't always that helpful.
Echoing the thoughts of @FibreFoX here—
Our containers are stateless and do not write anything to disk. With that in mind, and reading the current documentation as-is, we don't need an ephemeral_disk
at all. Unfortunately, omitting the stanza results in a 300mb disk being allocated.
Under most circumstances this would be fine, but I ended up here after debugging some placement issues for one of our jobs. We have a configuration where a node can run 50+ containers of the same image. We ended up getting a placement failure because of disk exhaustion.
To be clear, there could absolutely be something that I'm missing here! Also @jrasell I'd be happy to contribute some doc updates to help clear this up!
I'm also confused after reading the documentation.
I'm looking for a way to deploy stateful apps that can freely travel across the cluster without data loss. The data is small, is not accessed concurrently, and is allowed to die with the node if that happens.
So, ephemeral_disk
looks like a perfect fit, right? But then the questions arise.
– why are the logs written to the alloc dir? Is the storage "mine" then, or it rather exists for some system purposes
– what are the best efforts
– can I control the migration failure reaction
– why do we need other ways to persist data if such a great thing exists