alluxio
alluxio copied to clipboard
Alluxio and K8s have different interpretation of size
Alluxio Version: 2.4
Describe the bug
In K8s context, g
or GB
means 1000^3
and gi
or GiB
means 1024^3
.
https://stackoverflow.com/questions/50804915/kubernetes-size-definitions-whats-the-difference-of-gi-and-g
In Alluxio context, g
or GB
means 1024^3
.
https://github.com/Alluxio/alluxio/blob/a5265484f6f0cfcb7745dec31e61030056b49240/core/base/src/main/java/alluxio/Constants.java#L23
So when we use g
and pass the quota to Alluxio and K8s, K8s grants 1000^3
but Alluxio tries to utilize 1024^3
. For example if it is an emptyDir
, then the pod using the emptyDir
will be killed for overusing resources.
To Reproduce Steps to reproduce the behavior (as minimally and precisely as possible)
Expected behavior Alluxio and K8s should have consistent understanding about sizes, especially they are parsing the same piece of configuration field.
Urgency Describe the impact and urgency of the bug.
Additional context Add any other context about the problem here.
There are a few options:
-
Workaround: Use bytes all across the helm chart configuration to avoid conversion. This can only be a workaround as users will face the issue when they use expressions like
4G
. -
Change Alluxio conversion to be
G = 1000^3
andGi = 1024^3
. This breaks the backward compatibility but will guarantee consistent (and correct?) behavior. -
Change Alluxio conversion to be
G = Gi = 1024 ^ 3
. This gives backward compatibility but might be a confusing fix.
@madanadit What do you think about this?
@jiacheliu3 I would suggest:
- Alluxio helm chart uses the K8s convention (G = 1000^3) (for comformity)
- We do not change the convention for Alluxio servers either (for backwards compatibility)
- When we set a helm config in G or Gi, we transform the value to bytes and set Alluxio config in B. Similar for other units as well.
@jiacheliu3 I would suggest:
- Alluxio helm chart uses the K8s convention (G = 1000^3) (for comformity)
- We do not change the convention for Alluxio servers either (for backwards compatibility)
- When we set a helm config in G or Gi, we transform the value to bytes and set Alluxio config in B. Similar for other units as well.
This will only work for helm chart right? When the user tries to use kubectl
and manually give G
to both K8s and Alluxio, there will still be a confusing resource mismatch.
@madanadit @jiacheliu3 Since this issue seems to be popping up again, let's revive this discussion.
As Jiacheng mentioned there will be no way to ensure correct behaviour across all k8s tools (present and future) unless this is addressed properly. If we are not willing to break backwards compatibility in Alluxio (which is understandable), we can make it an opt-in rolling upgrade to enable the distinction in Alluxio between Gi
and G
and at some point in the future (e.g 3.0.0) we make it a fully committed change.
If this solution is undesirable then the only option we are left with is to concede that K8s and Alluxio have mismatching interpretations of G and document this extensively with corresponding workarounds.
- 1
KiB
= 1.024KB
, or conversely 0.9765625KiB
= 1KB
- 1
MiB
~= 1.049MB
, or conversely 0.953MiB
<= 1MB
- 1
GiB
~= 1.074GB
, or conversely 0.931GiB
<= 1GB
- 1
TiB
~= 1.1TB
, or conversely 0.909TiB
<= 1TB
- Therefore we document that any Alluxio worker cache capacity backed by a k8s volume specified using powers-of-10 definitions (i.e:
KB
,MB
,GB
), we require a high watermark <= 0.90 for ease-of-digestion (scales up and including to TiB)
Implementation ideas
If alluxio.storage.compatibility.kubernetes.enabled=true
then we enable definitions for KiB
, MiB
, GiB
and we change the following definitions to their k8s counterparts:
https://github.com/Alluxio/alluxio/blob/a5265484f6f0cfcb7745dec31e61030056b49240/core/base/src/main/java/alluxio/Constants.java#L21-L25
In truth I have no idea what ramifications there are for attempting to change the raw byte definitions for Alluxio from powers-of-two to powers-of-ten, but I imagine they would be widespread and cause incompatibility in other areas. So instead we may wish to limit the scope to specifically the properties in Alluxio which handle storage capacities.
Discussed wth @madanadit, here's a compromise that should be minimally impactful:
- Add the
KiB
/MiB
/GiB
/TiB
suffix to Alluxio's parsing and map them to be identical to the existingKB
/MB
/GB
/TB
mappings. - Specify that all users of the Alluxio in a K8s environment (Helm or otherwise) should use the 2-bit based notation for any storage sizes, as this is the only interpretation Alluxio supports.
This requires zero end-user changes for normal Alluxio deployments, and a small change to our Helm chart & corresponding documentation. Otherwise, if end users attempt to use the base-10 notation in k8s with Alluxio it is at their own risk.
Just for posterity I'm going to list the steps of how the Helm chart tieredstore.levels[0].quota=5G
value results in the following Alluxio error message:
tmpfs is smaller than the configured size: tmpfs size: 500002816, configured size: 524288000
-
The Helm chart value gets set as
alluxio.worker.tieredstore.level.0.dirs.quota=5G
in the ConfigMap for ALLUXIO_WORKER_JAVA_OPTS. -
The ConfigMap sets the Alluxio property key via the Worker container's env var
ALLUXIO_WORKER_JAVA_OPTS
. -
When the worker initializes its tiered cache, it parses
alluxio.worker.tieredstore.level.0.dirs.quota=5G
into bytes here.
- https://github.com/Alluxio/alluxio/blob/73af38d0ef4135d43efe158f67dea2898b1ffebf/core/server/worker/src/main/java/alluxio/worker/block/meta/DefaultStorageTier.java#L96
- It will check the size of the path configured for the ramdisk satisfies the configured capacity, and throw an error if it doesn't.
- https://github.com/Alluxio/alluxio/blob/73af38d0ef4135d43efe158f67dea2898b1ffebf/core/server/worker/src/main/java/alluxio/worker/block/meta/DefaultStorageTier.java#L117-L119
- https://github.com/Alluxio/alluxio/blob/73af38d0ef4135d43efe158f67dea2898b1ffebf/core/server/worker/src/main/java/alluxio/worker/block/meta/DefaultStorageTier.java#L162-L166