Linkerd is giving 200 or 400 responses for the same un-encoded url request depending on the situation
What is the issue?
Hello!
We found linkerd behaves differently for unencoded URLs depending on the situation.
- busy (when there is other traffic) => response 400 error
[s-m-sas-864c7cf5fc-h2nt5/linkerd-proxy] [ 193.755245s] INFO ThreadId(04) inbound:accept{client.addr=172.24.39.62:59404}: linkerd_app_core::serve: Connection closed error=invalid URI client.addr=172.24.39.62:59404
[s-m-sas-864c7cf5fc-h2nt5/linkerd-proxy] [ 193.755208s] DEBUG ThreadId(04) inbound:accept{client.addr=172.24.39.62:59404}:server{port=8080}:http: linkerd_proxy_http::server: The client is shutting down the connection res=Err(hyper::Error(Parse(Uri)))
[s-m-sas-864c7cf5fc-h2nt5/linkerd-proxy] [ 193.755192s] DEBUG ThreadId(04) inbound:accept{client.addr=172.24.39.62:59404}:server{port=8080}:http: hyper::proto::h1::io: flushed 84 bytes
[s-m-sas-864c7cf5fc-h2nt5/linkerd-proxy] [ 193.755174s] DEBUG ThreadId(04) inbound:accept{client.addr=172.24.39.62:59404}:server{port=8080}:http: hyper::proto::h1::role: sending automatic response (400 Bad Request) for parse error
[s-m-sas-864c7cf5fc-h2nt5/linkerd-proxy] [ 193.755168s] DEBUG ThreadId(04) inbound:accept{client.addr=172.24.39.62:59404}:server{port=8080}:http: hyper::proto::h1::conn: parse error (invalid URI) with 698 bytes
- not busy => response 200 success
It seems like linkerd is giving 200 or 400 responses for the same request depending on the situation.
I think this is an issue that needs to be resolved.
We put it into real after confirming that it responded successfully to un-encoded URLs in the test environment, but in real we encountered a failure where some requests suddenly failed.
How can it be reproduced?
url examples
- Encoded url example
http://gateway.io.jrpark.com/cafe/sas-m/search?version=1.0.0&pr=ssea&st=article.public&sm=all.basic&q=%EB%B8%8C%EB%A6%AC%ED%8A%B8%EB%8B%88%EC%8A%A4%ED%94%BC%EC%96%B4%EC%8A%A4&q_enc=utf-8&rp=rmdup.withpsg&so=rel.dsc&start=1&display=10&ic=basic&hl=titlebody.%3Cstrong%3E.%3C%2Fstrong%3E&r_enc=utf-8&r_format=xml
- Unencoded url example
http://gateway.io.jrpark.com/cafe/sas-m/search?version=1.0.0&pr=ssea&st=article.public&sm=all.basic&q=브리트니스피어스&q_enc=utf-8&rp=rmdup.withpsg&so=rel.dsc&start=1&display=10&ic=basic&hl=titlebody.<strong>.</strong>&r_enc=utf-8&r_format=xml
Steps
- Request with un-encoded url, Return 200.
$ curl -I 'http://gateway.io.jrpark.com/cafe/sas-m/search?version=1.0.0&pr=ssea&st=article.public&sm=all.basic&q=브리트니스피어스&q_enc=utf-8&rp=rmdup.withpsg&so=rel.dsc&start=1&display=10&ic=basic&hl=titlebody.<strong>.</strong>&r_enc=utf-8&r_format=xml'
HTTP/1.1 200 OK
Content-Type: text/html; charset=UTF-8
Connection: keep-alive
Date: Thu, 13 Jun 2024 01:04:17 GMT
Server: Apache
X-Kong-Upstream-Latency: 44
X-Kong-Proxy-Latency: 1
Via: kong/3.3.1
- Make a linkerd busy with other traffic. I generated 300 qps traffic with encoded url.
apiVersion: batch/v1
kind: Job
metadata:
name: request-to-cafe
namespace: clous-jrpark
spec:
completions: 1
parallelism: 1
template:
spec:
restartPolicy: Never
containers:
- args:
- run
- /scripts/loadtest.js
command:
- k6
env:
- name: ENDPOINT
value: http://gateway.io.jrpark.com/cafe/sas-m/search?version=1.0.0&pr=ssea&st=article.public&sm=all.basic&q=%EB%B8%8C%EB%A6%AC%ED%8A%B8%EB%8B%88%EC%8A%A4%ED%94%BC%EC%96%B4%EC%8A%A4&q_enc=utf-8&rp=rmdup.withpsg&so=rel.dsc&start=1&display=10&ic=basic&hl=titlebody.%3Cstrong%3E.%3C%2Fstrong%3E&r_enc=utf-8&r_format=xml
image: k6:v0.43.1
imagePullPolicy: IfNotPresent
name: k6
resources: {}
securityContext:
runAsUser: 0
volumeMounts:
- mountPath: /scripts
name: scripts
resources:
limits:
cpu: 8
memory: 10Gi
volumes:
- configMap:
defaultMode: 420
items:
- key: loadtest.js
path: loadtest.js
name: k6-scripts
name: scripts
---
apiVersion: v1
data:
loadtest.js: |
import http from "k6/http";
import { Rate } from "k6/metrics";
export const options = {
scenarios: {
constant_load: {
executor: "constant-arrival-rate",
rate: 300,
timeUnit: "1s",
duration: "60m",
preAllocatedVUs: 100,
maxVUs: 1000,
},
},
};
const endpoint = __ENV.ENDPOINT;
const requestRate = new Rate("request_rate");
export default function () {
const res = http.get(endpoint);
requestRate.add(res.status == 200);
}
kind: ConfigMap
metadata:
name: k6-scripts
namespace: clous-jrpark
$ k apply -f request-cafe-job.yaml
job.batch/request-to-cafe created
configmap/k6-scripts created
- Making a request to linkerd with an unencoded url while there is traffic will return a 400 error.
$ curl -I 'http://gateway.io.jrpark.com/cafe/sas-m/search?version=1.0.0&pr=ssea&st=article.public&sm=all.basic&q=브리트니스피어스&q_enc=utf-8&rp=rmdup.withpsg&so=rel.dsc&start=1&display=10&ic=basic&hl=titlebody.<strong>.</strong>&r_enc=utf-8&r_format=xml'
HTTP/1.1 400 Bad Request
Content-Length: 0
Connection: keep-alive
date: Thu, 13 Jun 2024 01:09:55 GMT
X-Kong-Upstream-Latency: 0
X-Kong-Proxy-Latency: 1
Via: kong/3.3.1
- Stop existing traffic tests.
$ k delete -f request-cafe-job.yaml
job.batch "request-to-cafe" deleted
configmap "k6-scripts" deleted
- Again, it successfully responds to the unencoded URL.
$ curl -I 'http://gateway.io.jrpark.com/cafe/sas-m/search?version=1.0.0&pr=ssea&st=article.public&sm=all.basic&q=브리트니스피어스&q_enc=utf-8&rp=rmdup.withpsg&so=rel.dsc&start=1&display=10&ic=basic&hl=titlebody.<strong>.</strong>&r_enc=utf-8&r_format=xml'
HTTP/1.1 200 OK
Content-Type: text/html; charset=UTF-8
Connection: keep-alive
Date: Thu, 13 Jun 2024 01:11:15 GMT
Server: Apache
X-Kong-Upstream-Latency: 1
X-Kong-Proxy-Latency: 1
Via: kong/3.3.1
Logs, error output, etc
above
output of linkerd check -o short
I think this is not related
linkerd check -o short
linkerd-config
--------------
× control plane ClusterRoleBindings exist
clusterrolebindings.rbac.authorization.k8s.io is forbidden: User "system:serviceaccount:clous-users:clous-developer" cannot list resource "clusterrolebindings" in API group "rbac.authorization.k8s.io" at the cluster scope
see https://linkerd.io/2/checks/#l5d-existence-crb for hints
Status check results are ×
Environment
- Kubernetes: v1.23.15
- Cluster Environment: Internal dedicated cluster
- Host OS: Linux 8.7
- Linkerd version:stable-2.13.5
Possible solution
No response
Additional context
https://linkerd.slack.com/archives/C89RTCWJF/p1718189489328629
=> I asked in slack.
Would you like to work on fixing this bug?
yes
Thanks for the detailed info. However I couldn't repro your issue. As server, I tried using an httpbin service with this manifest:
apiVersion: v1
kind: Service
metadata:
name: httpbin
labels:
app: httpbin
service: httpbin
spec:
ports:
- name: http
port: 8000
targetPort: 8080
selector:
app: httpbin
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: httpbin
spec:
selector:
matchLabels:
app: httpbin
template:
metadata:
labels:
app: httpbin
spec:
containers:
- image: docker.io/kong/httpbin
name: httpbin
command:
- gunicorn
- -b
- 0.0.0.0:8080
- httpbin:app
- -k
- gevent
env:
- name: WORKON_HOME
value: /tmp
ports:
- containerPort: 8080
On the client side, I can't get "invalid URI" using cURL, but I do with nc:
$ nc httpbin.default.svc.cluster.local 8000
GET %2Fhello%2Fworld HTTP/1.1
HTTP/1.1 400 Bad Request
content-length: 0
date: Tue, 25 Jun 2024 10:53:41 GMT
Yet the behavior is the same whether the server is under heavy load or not. Can you try reproducing your issue using this setup? Also, it seems you have a Kong Gateway at play, so I would try removing that from the mix to see if it's affecting things.
Can you try reproducing your issue using this setup? Also, it seems you have a Kong Gateway at play, so I would try removing that from the mix to see if it's affecting things.
@alpeb
Thansk for support!
I've tested it again, it seems the problem only occurs when both kong and linkerd are present in the request path.
In situations where only kong exists or only linkerd exists, there is no problem.
=> As you say, the problem seems to be in the mix between the two.
It looks like in a busy situation, kong or linkerd has different behavior and linkerd is responding it as an invalid URI. I'm not sure why linkerd judges it as an invalid uri, is there any way to know why it judged it as an invalid URI?
Even with the log level set to "trace," there was no detailed log for the invalid URI.
One quick question: what version of Kong are you using?
@kflynn
- kong:3.3.1
- kong-ingress-controller:v2.11.1
Hi @parkjeongryul, circling back on this, are you still having the same issues even after upgrading the components of your stack (k8s, kong, linkerd)? If so, could you provide us with an end-to-end repro, including kong's config?_
Closing this, as we don't have enough information to reproduce this problem. Please let us know if this problem persists.