pipeline icon indicating copy to clipboard operation
pipeline copied to clipboard

Allow multiple PVCs in combination with affinity assistant

Open flo-02-mu opened this issue 3 years ago • 6 comments

Feature request

It would be nice to be able to use more than one PVC in a pipeline and still use the affinity assistant to ensure all pods are scheduled to the same node.

Use case

In our pipelines we have three or more tasks running at the beginning of the pipeline in parallel that are making use of a workspace, e.g.:

  • Checkout the workspace
  • Download cache from storage account
  • Download deployment repo

With the current restriction of only having one PVC (we are using different subPaths) per pipelinerun (with affinity assistant enabled) we are seeing the following behavior: In the above example 4 pods (the three from the task + the affinity assistant) are sending the volume attach request. Depending on the timing, this either works fine or it results in all 3 task pods being stuck in the init phase. The pipeline recovers, but this results in task execution times of up to 2min for a simple git checkout task, while the best case is at ~20s.

Although this sounds more like a k8s issue (we are running on Azure Kubernetes Service 1.21 with new CSI driver), the behavior looks better when using one PVC per workspace (we tried with affinity assistant disabled). Running the pipelines with the affinity assistant disabled is not an option, since the time loss for possible volume re-attachements due to node switching is too high.

For the above reason we wonder if the restriction of not having more than 1 PVC together with affinity assistant can either be removed or be disabled by a feature flag. Additional question: Couldn't the behavior of attaching the volume to the affinity assistant be removed as well? The first task using a volume triggers the attachment to the node and since all subsequent pods are scheduled to the same node, they should be fine?

flo-02-mu avatar Jan 21 '22 07:01 flo-02-mu

Thanks for the feature request @flo-02-mu.

In your use case with e.g. "code checkout" and "download cache", it sounds like you want to have a Task in your pipeline with access to both workspaces, if I understand your case correct?

If those Tasks would use two different PVCs with different Affinity Assistans, they will likely end up on different Nodes (unless you run a single node cluster) - and in such case, the pipeline would be deadlocked.

Example Pipeline with 3 Tasks:

(code checkout) ----\
                      (use both workspaces)
(download cache) ---/

The Task (use both workspaces) can only run if the PVCs that it is using are mounted on the same Node.

jlpettersson avatar Jan 21 '22 22:01 jlpettersson

Yes, the situation is as you describe it (one task using both volumes from the previous tasks). My assumption was that there would be just one affinity assistant with both PVCs because they belong to the same pipelinerun? Or is it TaskRun based and the first task that uses the workspace instantiates the affinity assistant?

flo-02-mu avatar Jan 25 '22 07:01 flo-02-mu

My assumption was that there would be just one affinity assistant with both PVCs because they belong to the same pipelinerun?

Ah, now I understand. Yes, that is a fair proposal.

A feature request with similar goals, but different suggestion would be https://github.com/tektoncd/pipeline/issues/3440

Additional question: Couldn't the behavior of attaching the volume to the affinity assistant be removed as well?

That was how it was first implemented, but it did not work when storage class had volumeBindingMode: Immediate and the cluster had multiple AZs, the volume and the Task pod could end up on different AZs and be deadlocked. See e.g. https://github.com/tektoncd/pipeline/pull/2630#issuecomment-631146587

jlpettersson avatar Jan 25 '22 09:01 jlpettersson

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale with a justification. Stale issues rot after an additional 30d of inactivity and eventually close. If this issue is safe to close now please do so with /close with a justification. If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle stale

Send feedback to tektoncd/plumbing.

tekton-robot avatar Apr 25 '22 09:04 tekton-robot

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten with a justification. Rotten issues close after an additional 30d of inactivity. If this issue is safe to close now please do so with /close with a justification. If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle rotten

Send feedback to tektoncd/plumbing.

tekton-robot avatar May 25 '22 09:05 tekton-robot

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale with a justification. Stale issues rot after an additional 30d of inactivity and eventually close. If this issue is safe to close now please do so with /close with a justification. If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle stale

Send feedback to tektoncd/plumbing.

tekton-robot avatar Sep 15 '22 19:09 tekton-robot

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten with a justification. Rotten issues close after an additional 30d of inactivity. If this issue is safe to close now please do so with /close with a justification. If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle rotten

Send feedback to tektoncd/plumbing.

tekton-robot avatar Oct 15 '22 19:10 tekton-robot

@flo-02-mu we've implemented a new alpha feature (per-pipelinerun affinity assistant) that I think should address your use case. Please see these docs for more detail. I'm going to close out this issue so that feedback on this work is only tracked in one place, but we'd still love your feedback! Please feel free to weigh in on https://github.com/tektoncd/pipeline/issues/6990, or reopen this issue if your use case is not addressed.

lbernick avatar Jul 27 '23 19:07 lbernick

I'm a colleague of @flo-02-mu and I'll try the new feature in the coming days and will report back! THX!

msglueck avatar Sep 18 '23 09:09 msglueck