alluxio icon indicating copy to clipboard operation
alluxio copied to clipboard

Alluxio and K8s have different interpretation of size

Open jiacheliu3 opened this issue 4 years ago • 7 comments

Alluxio Version: 2.4

Describe the bug In K8s context, g or GB means 1000^3 and gi or GiB means 1024^3. https://stackoverflow.com/questions/50804915/kubernetes-size-definitions-whats-the-difference-of-gi-and-g

In Alluxio context, g or GB means 1024^3. https://github.com/Alluxio/alluxio/blob/a5265484f6f0cfcb7745dec31e61030056b49240/core/base/src/main/java/alluxio/Constants.java#L23

So when we use g and pass the quota to Alluxio and K8s, K8s grants 1000^3 but Alluxio tries to utilize 1024^3. For example if it is an emptyDir, then the pod using the emptyDir will be killed for overusing resources.

To Reproduce Steps to reproduce the behavior (as minimally and precisely as possible)

Expected behavior Alluxio and K8s should have consistent understanding about sizes, especially they are parsing the same piece of configuration field.

Urgency Describe the impact and urgency of the bug.

Additional context Add any other context about the problem here.

jiacheliu3 avatar Oct 14 '20 02:10 jiacheliu3

There are a few options:

  1. Workaround: Use bytes all across the helm chart configuration to avoid conversion. This can only be a workaround as users will face the issue when they use expressions like 4G.

  2. Change Alluxio conversion to be G = 1000^3 and Gi = 1024^3. This breaks the backward compatibility but will guarantee consistent (and correct?) behavior.

  3. Change Alluxio conversion to be G = Gi = 1024 ^ 3. This gives backward compatibility but might be a confusing fix.

jiacheliu3 avatar Oct 14 '20 02:10 jiacheliu3

@madanadit What do you think about this?

jiacheliu3 avatar Oct 14 '20 02:10 jiacheliu3

@jiacheliu3 I would suggest:

  1. Alluxio helm chart uses the K8s convention (G = 1000^3) (for comformity)
  2. We do not change the convention for Alluxio servers either (for backwards compatibility)
  3. When we set a helm config in G or Gi, we transform the value to bytes and set Alluxio config in B. Similar for other units as well.

madanadit avatar Oct 14 '20 02:10 madanadit

@jiacheliu3 I would suggest:

  1. Alluxio helm chart uses the K8s convention (G = 1000^3) (for comformity)
  2. We do not change the convention for Alluxio servers either (for backwards compatibility)
  3. When we set a helm config in G or Gi, we transform the value to bytes and set Alluxio config in B. Similar for other units as well.

This will only work for helm chart right? When the user tries to use kubectl and manually give G to both K8s and Alluxio, there will still be a confusing resource mismatch.

jiacheliu3 avatar Oct 14 '20 02:10 jiacheliu3

@madanadit @jiacheliu3 Since this issue seems to be popping up again, let's revive this discussion.

As Jiacheng mentioned there will be no way to ensure correct behaviour across all k8s tools (present and future) unless this is addressed properly. If we are not willing to break backwards compatibility in Alluxio (which is understandable), we can make it an opt-in rolling upgrade to enable the distinction in Alluxio between Gi and G and at some point in the future (e.g 3.0.0) we make it a fully committed change.

If this solution is undesirable then the only option we are left with is to concede that K8s and Alluxio have mismatching interpretations of G and document this extensively with corresponding workarounds.

  • 1 KiB = 1.024 KB, or conversely 0.9765625 KiB = 1 KB
  • 1 MiB ~= 1.049 MB, or conversely 0.953 MiB <= 1 MB
  • 1 GiB ~= 1.074 GB, or conversely 0.931 GiB <= 1 GB
  • 1 TiB ~= 1.1 TB, or conversely 0.909 TiB <= 1 TB
  • Therefore we document that any Alluxio worker cache capacity backed by a k8s volume specified using powers-of-10 definitions (i.e: KB, MB, GB), we require a high watermark <= 0.90 for ease-of-digestion (scales up and including to TiB)

Implementation ideas

If alluxio.storage.compatibility.kubernetes.enabled=true then we enable definitions for KiB, MiB, GiB and we change the following definitions to their k8s counterparts: https://github.com/Alluxio/alluxio/blob/a5265484f6f0cfcb7745dec31e61030056b49240/core/base/src/main/java/alluxio/Constants.java#L21-L25

In truth I have no idea what ramifications there are for attempting to change the raw byte definitions for Alluxio from powers-of-two to powers-of-ten, but I imagine they would be widespread and cause incompatibility in other areas. So instead we may wish to limit the scope to specifically the properties in Alluxio which handle storage capacities.

ZhuTopher avatar Mar 03 '22 19:03 ZhuTopher

Discussed wth @madanadit, here's a compromise that should be minimally impactful:

  1. Add the KiB/MiB/GiB/TiB suffix to Alluxio's parsing and map them to be identical to the existing KB/MB/GB/TB mappings.
  2. Specify that all users of the Alluxio in a K8s environment (Helm or otherwise) should use the 2-bit based notation for any storage sizes, as this is the only interpretation Alluxio supports.

This requires zero end-user changes for normal Alluxio deployments, and a small change to our Helm chart & corresponding documentation. Otherwise, if end users attempt to use the base-10 notation in k8s with Alluxio it is at their own risk.

ZhuTopher avatar Mar 03 '22 22:03 ZhuTopher

Just for posterity I'm going to list the steps of how the Helm chart tieredstore.levels[0].quota=5G value results in the following Alluxio error message:

tmpfs is smaller than the configured size: tmpfs size: 500002816, configured size: 524288000
  1. The Helm chart value gets set as alluxio.worker.tieredstore.level.0.dirs.quota=5G in the ConfigMap for ALLUXIO_WORKER_JAVA_OPTS.

  2. The ConfigMap sets the Alluxio property key via the Worker container's env var ALLUXIO_WORKER_JAVA_OPTS.

  3. When the worker initializes its tiered cache, it parses alluxio.worker.tieredstore.level.0.dirs.quota=5G into bytes here.

  • https://github.com/Alluxio/alluxio/blob/73af38d0ef4135d43efe158f67dea2898b1ffebf/core/server/worker/src/main/java/alluxio/worker/block/meta/DefaultStorageTier.java#L96
  1. It will check the size of the path configured for the ramdisk satisfies the configured capacity, and throw an error if it doesn't.
  • https://github.com/Alluxio/alluxio/blob/73af38d0ef4135d43efe158f67dea2898b1ffebf/core/server/worker/src/main/java/alluxio/worker/block/meta/DefaultStorageTier.java#L117-L119
  • https://github.com/Alluxio/alluxio/blob/73af38d0ef4135d43efe158f67dea2898b1ffebf/core/server/worker/src/main/java/alluxio/worker/block/meta/DefaultStorageTier.java#L162-L166

ZhuTopher avatar Jul 12 '22 23:07 ZhuTopher