enhancements icon indicating copy to clipboard operation
enhancements copied to clipboard

Local Ephemeral Storage Capacity Isolation

Open jingxu97 opened this issue 6 years ago • 82 comments

Feature Description

  • One-line feature description (can be used as a release note): Add support for capacity isolation of shared partitions for pods and containers.
  • Primary contact (assignee): @jingxu97 @vishh
  • Responsible SIGs: @kubernetes/sig-storage-feature-requests
  • Design proposal link (community repo): kubernetes/community#306
  • Reviewer(s) - (for LGTM) recommend having 2+ reviewers (at least one from code-area OWNERS file) agreed to review. Reviewers from multiple companies preferred: @derekwaynecarr, @vishh, @dashpole
  • Approver (likely from SIG/area to which feature belongs): @thockin, @vishh, @derekwaynecarr
  • Feature target (which target equals to which milestone):
    • Alpha release target (x.y): 1.7
    • Beta release target (x.y): 1.10
    • Stable release target (x.y): 1.25

jingxu97 avatar Jul 26 '17 16:07 jingxu97

@jingxu97 @kubernetes/sig-storage-feature-requests any updates for 1.8? Is this feature still on track for the release?

idvoretskyi avatar Sep 05 '17 14:09 idvoretskyi

This feature is on track for 1.8. Details is here #43607

jingxu97 avatar Sep 05 '17 19:09 jingxu97

@jingxu97 please, update the features tracking board with the relevant data.

idvoretskyi avatar Sep 12 '17 14:09 idvoretskyi

We intend to move local ephemeral storage to beta in 1.10.

saad-ali avatar Jan 23 '18 04:01 saad-ali

@jingxu97 it looks as though docs need updating for 1.10. Can you please submit a docs PR as soon as possible (it's now officially late), and update the 1.10 feature tracking spreadsheet? Thanks!

Bradamant3 avatar Mar 02 '18 20:03 Bradamant3

Hi Jennifer,

I submitted the PR https://github.com/kubernetes/website/pull/7614, but I could not edit the spreadsheet. Could you please help me check it? Thanks!

Best, Jing

On Fri, Mar 2, 2018 at 12:47 PM, Jennifer Rondeau [email protected] wrote:

@jingxu97 https://github.com/jingxu97 it looks as though docs need updating for 1.10. Can you please submit a docs PR as soon as possible (it's now officially late), and update the 1.10 feature tracking spreadsheet https://docs.google.com/spreadsheets/d/17bZrKTk8dOx5nomLrD1-93uBfajK5JS-v1o-nCLJmzE/edit#gid=0? Thanks!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kubernetes/features/issues/361#issuecomment-370047811, or mute the thread https://github.com/notifications/unsubscribe-auth/ASSNxQeScFENfuXNtUh9ctkN6reg_PWzks5taa_hgaJpZM4OkMYE .

--

  • Jing

jingxu97 avatar Mar 03 '18 01:03 jingxu97

Hi @jingxu97 -- Thanks for the docs PR. The spreadsheet is updated. Please note that you need to rebase your docs PR against the 1.10 docs branch -- we branch docs differently from the code repos. Thanks again! Jennifer

Bradamant3 avatar Mar 05 '18 15:03 Bradamant3

Hi @jingxu97 @saad-ali , the local ephemeral storage management only applies for the root partition in release-1.9 (alpha). Does it suppot the runtime partition in release-1.10 (beta)?

warmchang avatar Mar 07 '18 07:03 warmchang

@warmchang, for beta version, it will be the same as alpha which only applies for the root partition. We currently don't plan to support other runtime partition due to the complexity. Could you please let me know what user case you need it for different partitions? Thanks!

jingxu97 avatar Mar 07 '18 07:03 jingxu97

@jingxu97 I checked the original proposal local-storage-overview, it include the "Runtime Patition" description.

One scenario: The K8S deploy on IaaS (OpenStack or VMware) platform, base on considerations such as disk capacity, the nodes VMs would mount cloud disk as the "Docker Root Dir" instead of using the VMs' system root partitions. And then, how to manage the ephemeral storage for the containers running on the nodes? Thanks!

warmchang avatar Mar 07 '18 09:03 warmchang

@warmchang the runtime partition still has the same support it has had in the past. The kubelet will monitor the runtime partition, and perform evictions if space runs low based on the highest consumers of the runtime partition.

In your example, I'm not sure why using a cloud disk requires you to split the kubelet's and the runtime's partitions.

dashpole avatar Mar 07 '18 17:03 dashpole

@dashpole Before this Local Ephemeral Storage features, because the container writable layer unlimited write temporary files (such as logs) lead to full disk, resulting in the operating system hang, in order to prevent this behavior, we mount a Separate partition for Docker Root Dir.

We try the feature by this scenario, and found that it can not limit the capacity of container.

From a technical point of view, what is the difference between the capacity limits of the runtime partition and the root partition? Thanks!

warmchang avatar Mar 08 '18 02:03 warmchang

We try the feature by this scenario, and found that it can not limit the capacity of container.

The behavior you describe should work regardless of this feature. Make sure you have --root-dir set correctly. Docker reports its root directory to the kubelet, so as long as your images are stored on the same partition that contains /var/lib/docker (or whatever your docker root dir is), this should work correctly.

dashpole avatar Mar 12 '18 16:03 dashpole

@dashpole Very useful skill!

After verification (ping @zhangxiaoyu-zidif ), the expected effect can be achieved. 👏👏

[root@k8s-master-controller:/]$ kubectl get rs
NAME                             DESIRED   CURRENT   READY     AGE
busybox-apps-v1beta1-7f8dd8d89   1         1         1         21m
[root@k8s-master-controller:/]$ kubectl get pod --show-all
NAME                                   READY     STATUS    RESTARTS   AGE
busybox-apps-v1beta1-7f8dd8d89-kh6xc   1/1       Running   0          19m
busybox-apps-v1beta1-7f8dd8d89-mg7ls   0/1       Evicted   0          21m
[root@k8s-master-controller:/]$ kubectl describe pod busybox-apps-v1beta1-7f8dd8d89-mg7ls
Name:           busybox-apps-v1beta1-7f8dd8d89-mg7ls
Namespace:      default
Node:           172.160.134.17/
Start Time:     Mon, 23 Apr 2018 09:27:02 +0800
Labels:         app=busybox-apps-v1beta1
                pod-template-hash=394884845
Annotations:    kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicaSet","namespace":"default","name":"busybox-apps-v1beta1-7f8dd8d89","uid":"6c817aea-4695-11e8-9103-f...
Status:         Failed
Reason:         Evicted
Message:        The node was low on resource: ephemeral-storage.
IP:
Created By:     ReplicaSet/busybox-apps-v1beta1-7f8dd8d89
Controlled By:  ReplicaSet/busybox-apps-v1beta1-7f8dd8d89
Containers:
  busybox:
    Image:  busybox
    Port:   <none>
    Command:
      sleep
      3600
    Limits:
      ephemeral-storage:  50Mi
    Requests:
      ephemeral-storage:  50Mi
    Environment:          <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-7tchh (ro)
Volumes:
  default-token-7tchh:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-7tchh
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     <none>
Events:
  Type     Reason                 Age   From                     Message
  ----     ------                 ----  ----                     -------
  Normal   Scheduled              22m   default-scheduler        Successfully assigned busybox-apps-v1beta1-7f8dd8d89-mg7ls to 172.160.134.17
  Normal   SuccessfulMountVolume  22m   kubelet, 172.160.134.17  MountVolume.SetUp succeeded for volume "default-token-7tchh"
  Normal   Pulled                 22m   kubelet, 172.160.134.17  Container image "busybox" already present on machine
  Normal   Created                22m   kubelet, 172.160.134.17  Created container
  Normal   Started                22m   kubelet, 172.160.134.17  Started container
  Warning  Evicted                19m   kubelet, 172.160.134.17  pod ephemeral local storage usage exceeds the total limit of containers {{52428800 0} {<nil>} 50Mi BinarySI}
  Normal   Killing                19m   kubelet, 172.160.134.17  Killing container with id docker://busybox:Need to kill Pod
[root@k8s-master-controller:/]$

warmchang avatar Mar 16 '18 07:03 warmchang

that's great for us. thanks for your help =) @dashpole

zhangxiaoyu-zidif avatar Mar 16 '18 08:03 zhangxiaoyu-zidif

@jingxu97 @vishh Any plans for this in 1.11?

If so, can you please ensure the feature is up-to-date with the appropriate:

  • Description
  • Milestone
  • Assignee(s)
  • Labels:
    • stage/{alpha,beta,stable}
    • sig/*
    • kind/feature

cc @idvoretskyi

justaugustus avatar Apr 17 '18 02:04 justaugustus

This feature current has no milestone, so we'd like to check in and see if there are any plans for this in Kubernetes 1.12.

If so, please ensure that this issue is up-to-date with ALL of the following information:

  • One-line feature description (can be used as a release note):
  • Primary contact (assignee):
  • Responsible SIGs:
  • Design proposal link (community repo):
  • Link to e2e and/or unit tests:
  • Reviewer(s) - (for LGTM) recommend having 2+ reviewers (at least one from code-area OWNERS file) agreed to review. Reviewers from multiple companies preferred:
  • Approver (likely from SIG/area to which feature belongs):
  • Feature target (which target equals to which milestone):
    • Alpha release target (x.y)
    • Beta release target (x.y)
    • Stable release target (x.y)

Set the following:

  • Description
  • Assignee(s)
  • Labels:
    • stage/{alpha,beta,stable}
    • sig/*
    • kind/feature

Once this feature is appropriately updated, please explicitly ping @justaugustus, @kacole2, @robertsandoval, @rajendar38 to note that it is ready to be included in the Features Tracking Spreadsheet for Kubernetes 1.12.


Please note that Features Freeze is tomorrow, July 31st, after which any incomplete Feature issues will require an Exception request to be accepted into the milestone.

In addition, please be aware of the following relevant deadlines:

  • Docs deadline (open placeholder PRs): 8/21
  • Test case freeze: 8/28

Please make sure all PRs for features have relevant release notes included as well.

Happy shipping!

P.S. This was sent via automation

justaugustus avatar Jul 30 '18 22:07 justaugustus

Hi This enhancement has been tracked before, so we'd like to check in and see if there are any plans for this to graduate stages in Kubernetes 1.13. This release is targeted to be more ‘stable’ and will have an aggressive timeline. Please only include this enhancement if there is a high level of confidence it will meet the following deadlines:

  • Docs (open placeholder PRs): 11/8
  • Code Slush: 11/9
  • Code Freeze Begins: 11/15
  • Docs Complete and Reviewed: 11/27

Please take a moment to update the milestones on your original post for future tracking and ping @kacole2 if it needs to be included in the 1.13 Enhancements Tracking Sheet

Thanks!

kacole2 avatar Oct 08 '18 16:10 kacole2

I mentioned this in the meeting today, but I wanted to add it here too. I think that apps must have a way to identify what quotas exist, and adapt to them. That would be a blocker from enabling enforcement later because if an app keeps exceeding quota and is killed repeatedly - that's bad. At the very least can we get this visible in a downward API if existing filesystem mechanisms won't work to discover the quota?

From a Kubernetes API standpoint, we need to be careful not to require an OS or cloud provider specific implementation. On Windows filesystems, quotas are not available so we would probably use a loopback volume to implement this. That preserves the Windows API behavior where if an app queries for the free space, it gets the space within that loopback volume.

PatrickLang avatar Dec 04 '18 18:12 PatrickLang

@jingxu97 @vishh Hello - I’m the enhancement’s lead for 1.14 and I’m checking in on this issue to see what work (if any) is being planned for the 1.14 release. Enhancements freeze is Jan 29th and I want to remind that all enhancements must have a KEP

claurence avatar Jan 16 '19 15:01 claurence

Hello @jingxu97 @vishh , I'm the Enhancement Lead for 1.15. Is this feature going to be graduating alpha/beta/stable stages in 1.15? Please let me know so it can be tracked properly and added to the spreadsheet. This will also require a KEP for inclusion

Once coding begins, please list all relevant k/k PRs in this issue so they can be tracked properly.

kacole2 avatar Apr 12 '19 18:04 kacole2

We try the feature by this scenario, and found that it can not limit the capacity of container.

The behavior you describe should work regardless of this feature. Make sure you have --root-dir set correctly. Docker reports its root directory to the kubelet, so as long as your images are stored on the same partition that contains /var/lib/docker (or whatever your docker root dir is), this should work correctly.

Hello ,

I would like to check if there any way to restrict Pods usage on ephemeral storage /var/lib/docker) regardless of mounted on node root fs or separate file system of /var/lib/docker .

Because the pods run time writable layers or logs growing up and that will fill out /var/lib/docker file system . This behavior is getting fill up the file system and stop other pods to run.

It would be great if we restrict pods use a limited amount of ephemeral storage on cluster wide . eg: set 20G quota for PODS that mean each pods can use only 20GB on ephemeral storage , if need more space should use the PV. Is there any possibility to do that

arunbpt7 avatar Jun 10 '19 14:06 arunbpt7

Yes, you can do this with ephemeral storage. See the documentation. Make sure you have eviction enabled for both the "imagefs" and the "nodefs" (documentation).

dashpole avatar Jun 10 '19 15:06 dashpole

Yes, you can do this with ephemeral storage. See the documentation. Make sure you have eviction enabled for both the "imagefs" and the "nodefs" (documentation).

Thanks for the update , have defined ephemeral-storage request and limit in resources (spec.hard.requests.ephemeral-storage , spec.hard.limits.ephemeral-storage) on the deployment and verified that evictionHard: is enabled for "imagefs and "nodefs" on the node . but when when deploying the pod and it is not restricting the pod to use the defined ephemeral storage . when creating large file inside the container it is still able to create files more that the ephemeral-storage request and limit.

evictionHard: imagefs.available: 15% memory.available: 100Mi nodefs.available: 10% nodefs.inodesFree: 5%


containers: - name: busybox image: resources: requests: ephemeral-storage: "500Mi" limits: ephemeral-storage: "500Mi"

arunbpt7 avatar Jun 10 '19 17:06 arunbpt7

Sounds like a bug. Feel free to open a separate issue and cc me, as this is for feature tracking.

dashpole avatar Jun 10 '19 17:06 dashpole

Sounds like a bug. Feel free to open a separate issue and cc me, as this is for feature tracking.

Thank you , have opened a new issue ( local ephemeral Storage limitation for pods in the cluster #1094)

arunbpt7 avatar Jun 10 '19 17:06 arunbpt7

Hi @arunbpt7 @jingxu97 @vishh , I'm the 1.16 Enhancement Lead/Shadow. Is this feature going to be graduating alpha/beta/stable stages in 1.16? Please let me know so it can be added to the 1.16 Tracking Spreadsheet. If not's graduating, I will remove it from the milestone and change the tracked label.

Once coding begins or if it already has, please list all relevant k/k PRs in this issue so they can be tracked properly.

As a reminder, every enhancement requires a KEP in an implementable state with Graduation Criteria explaining each alpha/beta/stable stages requirements.

Milestone dates are Enhancement Freeze 7/30 and Code Freeze 8/29.

Thank you.

kacole2 avatar Jul 09 '19 15:07 kacole2

Hey there @arunbpt7 @jingxu97 @vishh -- 1.17 Enhancements shadow here 👋 . I wanted to check in and see if you think this Enhancement will be graduating to alpha/beta/stable in 1.17?

The current release schedule is:

  • Monday, September 23 - Release Cycle Begins
  • Tuesday, October 15, EOD PST - Enhancements Freeze
  • Thursday, November 14, EOD PST - Code Freeze
  • Tuesday, November 19 - Docs must be completed and reviewed
  • Monday, December 9 - Kubernetes 1.17.0 Released

If you do, I'll add it to the 1.17 tracking sheet (https://bit.ly/k8s117-enhancements). Once coding begins please list all relevant k/k PRs in this issue so they can be tracked properly. 👍

We'll also need to convert the design proposal into a KEP. To be accepted in the release, all enhancements MUST have a KEP, the KEP MUST be merged, in an implementable state, and have both graduation criteria/test plan.

Thanks!

jeremyrickard avatar Oct 01 '19 18:10 jeremyrickard

Hey there @arunbpt7 @jingxu97 @vishh -- 1.18 Enhancements shadow here. I wanted to check in and see if you think this Enhancement will be graduating to alpha/beta/stable in 1.18 or having a major change in its current level?

The current release schedule is:

  • Monday, January 6th - Release Cycle Begins
  • Tuesday, January 28th EOD PST - Enhancements Freeze
  • Thursday, March 5th, EOD PST - Code Freeze
  • Monday, March 16th - Docs must be completed and reviewed
  • Tuesday, March 24th - Kubernetes 1.18.0 Released

To be included in the release,

  1. The KEP PR must be merged
  2. The KEP must be in an implementable state
  3. The KEP must have test plans and graduation criteria.

If you would like to include this enhancement, once coding begins please list all relevant k/k PRs in this issue so they can be tracked properly. 👍

We'll be tracking enhancements here: http://bit.ly/k8s-1-18-enhancements

Thanks! :)

palnabarun avatar Jan 13 '20 08:01 palnabarun

@arunbpt7 @jingxu97 @vishh Just a friendly reminder, we are just 7 days away from the Enhancement Freeze (Tuesday, January 28th).

palnabarun avatar Jan 22 '20 09:01 palnabarun