nomad
nomad copied to clipboard
GC limits > 3 days are in effect infinite b/c of FSM timetable limit
Nomad version
1.5.0 and anything prior.
Operating system and Environment details
Unix.
Issue
When Garbage collection limits are set to a value larger than 3 days, the Nomad Scheduler will never garbage collect the required object leading to infinite accumulation of data (and infinite memory and disk leak) and related resources (such as CSI volumes). The GC limits included are at least:
- https://developer.hashicorp.com/nomad/docs/configuration/server#eval_gc_threshold
- https://developer.hashicorp.com/nomad/docs/configuration/server#batch_eval_gc_threshold
- https://developer.hashicorp.com/nomad/docs/configuration/server#deployment_gc_threshold
- https://developer.hashicorp.com/nomad/docs/configuration/server#job_gc_threshold
- https://developer.hashicorp.com/nomad/docs/configuration/server#acl_token_gc_threshold
- https://developer.hashicorp.com/nomad/docs/configuration/server#csi_plugin_gc_threshold
- https://developer.hashicorp.com/nomad/docs/configuration/server#csi_volume_claim_gc_interval
The expected behavior is that it is possible to set garbage collection limits at a much larger maximal value than 3 days to allow for history build up and easier debugging.
The details of the bug.
At the time of garbage collection, Nomad will derive an approximate raft index which is used as a watermark for garbage collection. The mapping of time to such an index is handled uniformly via:
https://github.com/hashicorp/nomad/blob/v1.5.0/nomad/core_sched.go#L1133-L1143
This relies on fsm and the TimeTable which is initialized within. To be precise, the initialization of this table occurs here:
https://github.com/hashicorp/nomad/blob/v1.5.0/nomad/fsm.go#L170
With a hard-coded maximal time table limit:
https://github.com/hashicorp/nomad/blob/v1.5.0/nomad/fsm.go#L27-L29
If the limit is breached, the resolution of the index is going to default to zero:
https://github.com/hashicorp/nomad/blob/v1.5.0/nomad/timetable.go#L93-L106
Hence, thresholdIndex = 0 which results in any check of the form X.modifyIndex > thresholdIndex to evaluate to true resulting in no garbage collection. For instance, for eval s:
https://github.com/hashicorp/nomad/blob/v1.5.0/nomad/core_sched.go#L282-L288
Repro.
The simplest way to reproduce this behavior is by modifying the code to change the maximal time table limit of fsm to something small and observe that no GC occurs for evaluations which should be GCed. A unit test of Fsm or garbage collection may also be used to confirm the behavior.