kubectl icon indicating copy to clipboard operation
kubectl copied to clipboard

`kubectl --sort-by=.metadata.uid` should not use natural sort

Open aams-eam opened this issue 4 months ago • 12 comments

What happened?

After doing kubectl get pod -A --sort-by=.metadata.uid I realized the pods were not ordered by uid

What did you expect to happen?

All the pods of the cluster being ordered by uid

How can we reproduce it (as minimally and precisely as possible)?

The following screenshot shows the result of the commands trying to order the pods: kubectl--sort-by

Anything else we need to know?

No response

Kubernetes version

$ kubectl version
Client Version: v1.29.2
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.29.2

$ kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"29", GitVersion:"v1.29.2", GitCommit:"4b8e819355d791d96b7e9d9efe4cbafae2311c88", GitTreeState:"clean", BuildDate:"2024-02-14T10:39:04Z", GoVersion:"go1.21.7", Compiler:"gc", Platform:"linux/amd64"}

$ kubelet --version
Kubernetes v1.29.2

Cloud provider

No cloud provider. Installation done in two virtual machines (i.e., control plane and worker) using virtualbox

OS version

# On Linux:
$ cat /etc/os-release
NAME="Ubuntu"
VERSION="20.04.6 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.6 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal

$ uname -a
Linux control-plane-server 5.4.0-173-generic kubernetes/kubernetes#191-Ubuntu SMP Fri Feb 2 13:55:07 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

Install tools

Container runtime (CRI) and version (if applicable)

containerd containerd.io 1.6.28 ae07eda36dd25f8a1b98dfbf587313b99c0190bb

Related plugins (CNI, CSI, ...) and versions (if applicable)

aams-eam avatar Mar 09 '24 18:03 aams-eam

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Mar 09 '24 18:03 k8s-ci-robot

/sig CLI

pranav-pandey0804 avatar Mar 10 '24 09:03 pranav-pandey0804

/transfer kubectl

ardaguclu avatar Mar 14 '24 15:03 ardaguclu

/assign @brianpursley

eddiezane avatar Mar 27 '24 16:03 eddiezane

I did some research on this and found something interesting.

Kubectl uses something called "Natural Sorting" to sort strings.

https://github.com/kubernetes/kubernetes/blob/f4e246bc93ffb68b33ed67c7896c379efa4207e7/staging/src/k8s.io/kubectl/pkg/cmd/get/sorter.go#L272-L284

And this results in some unexpected results when sorting strings that contain both digits and alpha characters.

In 2021, a change was made to the code to sort Quantity strings (100m, 8Gi, etc), but this is handling only a special case.

The general problem is that strings are sorted in an unexpected way, even for things like name, so it is not limited to just UID... it's just more obvious due to the alphanumeric format of UIDs.

Example:

Create two pods, named a12 and a9a:

kubectl apply -f - << EOF
apiVersion: v1
kind: Namespace
metadata:
  name: sorttest
---
apiVersion: v1
kind: Pod
metadata:
  name: a12
  namespace: sorttest
spec:
  containers:
  - image: registry.k8s.io/pause 
    name: pause
---
apiVersion: v1
kind: Pod
metadata:
  name: a9a
  namespace: sorttest
spec:
  containers:
  - image: registry.k8s.io/pause
    name: pause
EOF

Then get the pods, sorted by name:

kubectl get pod -n sorttest --sort-by={.metadata.name}
NAME   READY   STATUS    RESTARTS   AGE
a9a    1/1     Running   0          101s
a12    1/1     Running   0          101s

Why is a9a sorted before a12?

I believe it is because "Natural Sorting" is attempting to group all the digits together for comparison separately from alpha characters, so it is comparing 9 to 12, not 9 to 1.

I don't know the use case for wanting to sort strings in this way.

I can't imagine this would be the way most people would expect the output to be sorted, and when it does, it just makes it seem like a bug, whereas clearly this is intentional, according to the code.

@eddiezane @mpuckett159 (and anyone else): What would the impact be of changing the method kubectl uses for sorting to use a simple string comparison instead of "Natural Sorting"? I guess it would be considered a breaking change, but is there a way we could work toward changing the sort via some kind of flag and deprecation period?

Related issues:

  • https://github.com/kubernetes/kubectl/issues/1517 where the pod names get misinterpreted as Quantity in some cases (if it is parsed successfully as a quantity).

brianpursley avatar Mar 27 '24 20:03 brianpursley

Hi @brianpursley

That is a very detailed explanation. Thank you very much!

aams-eam avatar Mar 27 '24 22:03 aams-eam

My two cents, this is a bug and should be fixed. I am sure that people will say that because this changes user observable behavior we should not implement a fix because it could break downstream tooling, however I can think of no reasonable use case where this would be considered "expected behavior" and would definitely like to push forward a more correct sorting method for strings. I'm not sure the best method to differentiate between fields that would benefit from natural sorting and which it would detract from, but we should really figure out a way to handle that. This has long been something I haven't really understood but noticed also. Glad to finally have an explanation for this.

mpuckett159 avatar Mar 27 '24 22:03 mpuckett159

My initial investigation which is linked in @brianpursley 's post above definitely suggests that values are all sorted using the natural sort found in the apiMachinery section. I think it would be quite a big change likely involving some sort of internal catalog that indicates which fields are plain strings requiring regular string sort, and which are numbers (potentially with SI units making JSON regard them as strings), which would require natural sort.

But I'd really like to see it addressed! The issues I raised both here and in apiMachinery are going stale!

fireflycons avatar Mar 28 '24 22:03 fireflycons

/retitle kubectl --sort-by=.metadata.uid should not use natural sort /kind feature /remove-kind bug

sftim avatar Apr 23 '24 20:04 sftim

To be clear on this, only properties that contain some kind of numeric quantity (all resources, including custom) should pass through the natural sort algorithm - else it should be a plain string sort.

fireflycons avatar Apr 24 '24 05:04 fireflycons

OK, it's a bug /kind bug /remove-kind feature

sftim avatar Apr 24 '24 08:04 sftim

/triage accepted

mpuckett159 avatar Apr 24 '24 16:04 mpuckett159

I think there is another one as a duplicate of this https://github.com/kubernetes/kubectl/issues/1517

ardaguclu avatar May 07 '24 11:05 ardaguclu

I think there is another one as a duplicate of this https://github.com/kubernetes/kubectl/issues/1517

ardaguclu avatar May 07 '24 11:05 ardaguclu