pipeline
pipeline copied to clipboard
Allow multiple PVCs in combination with affinity assistant
Feature request
It would be nice to be able to use more than one PVC in a pipeline and still use the affinity assistant to ensure all pods are scheduled to the same node.
Use case
In our pipelines we have three or more tasks running at the beginning of the pipeline in parallel that are making use of a workspace, e.g.:
- Checkout the workspace
- Download cache from storage account
- Download deployment repo
With the current restriction of only having one PVC (we are using different subPaths) per pipelinerun (with affinity assistant enabled) we are seeing the following behavior: In the above example 4 pods (the three from the task + the affinity assistant) are sending the volume attach request. Depending on the timing, this either works fine or it results in all 3 task pods being stuck in the init phase. The pipeline recovers, but this results in task execution times of up to 2min for a simple git checkout task, while the best case is at ~20s.
Although this sounds more like a k8s issue (we are running on Azure Kubernetes Service 1.21 with new CSI driver), the behavior looks better when using one PVC per workspace (we tried with affinity assistant disabled). Running the pipelines with the affinity assistant disabled is not an option, since the time loss for possible volume re-attachements due to node switching is too high.
For the above reason we wonder if the restriction of not having more than 1 PVC together with affinity assistant can either be removed or be disabled by a feature flag. Additional question: Couldn't the behavior of attaching the volume to the affinity assistant be removed as well? The first task using a volume triggers the attachment to the node and since all subsequent pods are scheduled to the same node, they should be fine?
Thanks for the feature request @flo-02-mu.
In your use case with e.g. "code checkout" and "download cache", it sounds like you want to have a Task in your pipeline with access to both workspaces, if I understand your case correct?
If those Tasks would use two different PVCs with different Affinity Assistans, they will likely end up on different Nodes (unless you run a single node cluster) - and in such case, the pipeline would be deadlocked.
Example Pipeline with 3 Tasks:
(code checkout) ----\
(use both workspaces)
(download cache) ---/
The Task (use both workspaces)
can only run if the PVCs that it is using are mounted on the same Node.
Yes, the situation is as you describe it (one task using both volumes from the previous tasks). My assumption was that there would be just one affinity assistant with both PVCs because they belong to the same pipelinerun? Or is it TaskRun based and the first task that uses the workspace instantiates the affinity assistant?
My assumption was that there would be just one affinity assistant with both PVCs because they belong to the same pipelinerun?
Ah, now I understand. Yes, that is a fair proposal.
A feature request with similar goals, but different suggestion would be https://github.com/tektoncd/pipeline/issues/3440
Additional question: Couldn't the behavior of attaching the volume to the affinity assistant be removed as well?
That was how it was first implemented, but it did not work when storage class had volumeBindingMode: Immediate
and the cluster had multiple AZs, the volume and the Task pod could end up on different AZs and be deadlocked. See e.g. https://github.com/tektoncd/pipeline/pull/2630#issuecomment-631146587
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
with a justification.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen
with a justification.
/lifecycle stale
Send feedback to tektoncd/plumbing.
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten
with a justification.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close
with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen
with a justification.
/lifecycle rotten
Send feedback to tektoncd/plumbing.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
with a justification.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen
with a justification.
/lifecycle stale
Send feedback to tektoncd/plumbing.
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten
with a justification.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close
with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen
with a justification.
/lifecycle rotten
Send feedback to tektoncd/plumbing.
@flo-02-mu we've implemented a new alpha feature (per-pipelinerun affinity assistant) that I think should address your use case. Please see these docs for more detail. I'm going to close out this issue so that feedback on this work is only tracked in one place, but we'd still love your feedback! Please feel free to weigh in on https://github.com/tektoncd/pipeline/issues/6990, or reopen this issue if your use case is not addressed.
I'm a colleague of @flo-02-mu and I'll try the new feature in the coming days and will report back! THX!