nomad icon indicating copy to clipboard operation
nomad copied to clipboard

GC limits > 3 days are in effect infinite b/c of FSM timetable limit

Open stswidwinski opened this issue 2 years ago • 2 comments

Nomad version

1.5.0 and anything prior.

Operating system and Environment details

Unix.

Issue

When Garbage collection limits are set to a value larger than 3 days, the Nomad Scheduler will never garbage collect the required object leading to infinite accumulation of data (and infinite memory and disk leak) and related resources (such as CSI volumes). The GC limits included are at least:

  1. https://developer.hashicorp.com/nomad/docs/configuration/server#eval_gc_threshold
  2. https://developer.hashicorp.com/nomad/docs/configuration/server#batch_eval_gc_threshold
  3. https://developer.hashicorp.com/nomad/docs/configuration/server#deployment_gc_threshold
  4. https://developer.hashicorp.com/nomad/docs/configuration/server#job_gc_threshold
  5. https://developer.hashicorp.com/nomad/docs/configuration/server#acl_token_gc_threshold
  6. https://developer.hashicorp.com/nomad/docs/configuration/server#csi_plugin_gc_threshold
  7. https://developer.hashicorp.com/nomad/docs/configuration/server#csi_volume_claim_gc_interval

The expected behavior is that it is possible to set garbage collection limits at a much larger maximal value than 3 days to allow for history build up and easier debugging.

The details of the bug.

At the time of garbage collection, Nomad will derive an approximate raft index which is used as a watermark for garbage collection. The mapping of time to such an index is handled uniformly via:

https://github.com/hashicorp/nomad/blob/v1.5.0/nomad/core_sched.go#L1133-L1143

This relies on fsm  and the TimeTable  which is initialized within. To be precise, the initialization of this table occurs here:

https://github.com/hashicorp/nomad/blob/v1.5.0/nomad/fsm.go#L170

With a hard-coded maximal time table limit:

https://github.com/hashicorp/nomad/blob/v1.5.0/nomad/fsm.go#L27-L29

If the limit is breached, the resolution of the index is going to default to zero:

https://github.com/hashicorp/nomad/blob/v1.5.0/nomad/timetable.go#L93-L106

Hence, thresholdIndex = 0  which results in any check of the form X.modifyIndex > thresholdIndex  to evaluate to true  resulting in no garbage collection. For instance, for eval s:

https://github.com/hashicorp/nomad/blob/v1.5.0/nomad/core_sched.go#L282-L288

Repro.

The simplest way to reproduce this behavior is by modifying the code to change the maximal time table limit of fsm  to something small and observe that no GC occurs for evaluations which should be GCed. A unit test of Fsm  or garbage collection may also be used to confirm the behavior.

stswidwinski avatar Mar 07 '23 10:03 stswidwinski