Fixed large resourceversion and limit for storages
What type of PR is this?
/kind bug
What this PR does / why we need it:
Currently, when you do try to query the StorageClassList with a very high resourceVersion and a high limit value, you'll get a response, like this: NOTE: Youll not only get a error message from etcdserver, you will also get a 500 error code.
curl -v -X GET "https://localhost:6443/apis/storage.k8s.io/v1/storageclasses?resourceVersion=276&timeoutSeconds=1&gracePeriodSeconds=103" -H "Authorization: Bearer $TOKEN" --insecure
{
"kind": "Status",
"apiVersion": "v1",
"metadata": {},
"status": "Failure",
"message": "etcdserver: mvcc: required revision is a future revision",
"code": 500
The issues #132358 suggested, that we might want to return a more graceful error message and a error code of 504, like this:
"causes": [
{
"reason": "ResourceVersionTooLarge",
"message": "Too large resource version"
}
],
"retryAfterSeconds": 1
},
"code": 504
After making some slight adjustments on the error handling, we are able to query the StorageClassList with a very high resourceVersion and a high limit value, and we will receive the desired message:
curl -v -X GET "https://localhost:6443/apis/storage.k8s.io/v1/storageclasses?resourceVersion=44232320&limit=38&timeoutSeconds=1&gracePeriodSeconds=103" -H "Authorization: Bearer $TOKEN" --insecure
{
"kind": "Status",
"apiVersion": "v1",
"metadata": {},
"status": "Failure",
"message": "Timeout: Too large resource version: 2902323232, current: 300",
"reason": "Timeout",
"details": {
"causes": [
{
"reason": "ResourceVersionTooLarge",
"message": "Too large resource version"
}
],
"retryAfterSeconds": 1
},
"code": 504
Which issue(s) this PR is related to:
Fixes #132358
Special notes for your reviewer:
I am quite not sure, if the current change is what we really want. This was a challenge for me 😄
We will check for the error message from etcd:
if err == etcdrpc.ErrFutureRev, which is infact this
ErrGRPCFutureRev = status.Error(codes.OutOfRange, "etcdserver: mvcc: required revision is a future revision")
As I am not sure, if this is a good practice, I looked at the interpretListError function, which will be called within the GetList function.
And yes, we already doing it this way:
func interpretListError(err error, paging bool, continueKey, keyPrefix string) error {
switch {
case err == etcdrpc.ErrCompacted:
if paging {
return handleCompactedErrorForPaging(continueKey, keyPrefix)
}
return errors.NewResourceExpired(expired)
}
return err
}
But I'd be more than happy for some suggestions, if this is not the right approach 👍
Does this PR introduce a user-facing change?
Fixed API response for StorageClassList queries and returns a graceful error message, if the provided ResourceVersion is too large.
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:
NONE
This issue is currently awaiting triage.
If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.
The triage/accepted label can be added by org members by writing /triage accepted in a comment.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.
Hi @PatrickLaabs. Thanks for your PR.
I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.
Once the patch is verified, the new status will be reflected by the ok-to-test label.
I understand the commands that are listed here.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.
Could you please add a unit test for it?
Thanks for the response and of course. I was waiting for a response on this, before I invest more time.
etcdserver: mvcc: required revision is a future revision is error from etcd, Too large resource version is error from cache.
Reason why you get different errors is due to &limit=38 argument, it changes LIST semantic from NotOlderThan served from cache, to legacy Exact served from etcd.
I'm not against unifying the semantic, I think it's even benefitial, but we definitely need a test for that. Suggest adding a scenario to RunTestList.
After reviewing the unit test, I have found something:
if tt.expectRVTooLarge {
// TODO: Clasify etcd future revision error as TooLargeResourceVersion
if err == nil || !(storage.IsTooLargeResourceVersion(err) || strings.Contains(err.Error(), "etcdserver: mvcc: required revision is a future revision")) {
t.Fatalf("expecting resource version too high error, but get: %q", err)
}
return
}
If I am not completely wrong here.. thats exact the point we are looking for in our unit tests, right?
With my update Code, I had commented in the reviews, I made these adjustments for the unit test:
if tt.expectRVTooLarge {
if !storage.IsTooLargeResourceVersion(err) {
t.Fatalf("expecting resource version too high error, but get: %v", err)
}
return
}
What do you thing? Or shall we extend the testings?
If I am not completely wrong here.. thats exact the point we are looking for in our unit tests, right?
yes
/ok-to-test
/lgtm
PTAL @wojtek-t
LGTM label has been added.
/lgtm /approve
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: PatrickLaabs, wojtek-t
The full list of commands accepted by this bot can be found here.
The pull request process is described here
- ~~staging/src/k8s.io/apiserver/pkg/storage/OWNERS~~ [wojtek-t]
Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment
Follow-Up issue to improve returned error message:
#132526