[BUG] micro-service use gRPC-Gateway, the client send http request to grpc, the request resolve status is unknown
Search before asking
- [X] I had searched in the issues and found no similar feature requirement.
DeepFlow Component
Agent
What you expected to happen
背景: k8s环境中,有部分微服务使用了gRPC-Gateway发送http内部转grpc请求
结论: deepflow可以采集到http的请求内容,但是无法采集和解析grpc请求,响应状态为unknown
使用组件: https://github.com/grpc-ecosystem/grpc-gateway
组件原理:
请求拓扑:
代码实现:
定义一个grpc接口文件
syntax = "proto3";
option go_package=".;sys_time";
package sys_time;
import "google/api/annotations.proto";
message GetCurrSysTimeReq {
}
message GetCurrSysTimeRsp {
message Data {
int64 timeStamp = 1;
}
uint32 code = 1;
string msg = 2;
Data data = 3;
}
service SysTime {
// 获取当前系统时间
rpc GetCurrSysTime(GetCurrSysTimeReq) returns (GetCurrSysTimeRsp) {
option (google.api.http) = {
get: "/api/v1/currSysTime",
};
}
}
golang代码实现
// ...
func (sysTimeServer *SysTimeServer) GetCurrSysTime(ctx context.Context,req *sys_time.GetCurrSysTimeReq) (*sys_time.GetCurrSysTimeRsp, error) {
currentTime := time.Now()
milliSeconds := currentTime.UnixMilli()
return &sys_time.GetCurrSysTimeRsp{
Code: APISuccess.HttpCode,
Msg: APISuccess.Code,
Data: &sys_time.GetCurrSysTimeRsp_Data{
TimeStamp: milliSeconds,
},
}, nil
}
// ...
deepflow可观测结果
前端gateway->micro-service:http请求采集正常
micro-service->grpc gateway->grpc server: grpc请求采集异常
经过grpc gateway采集的grpc请求状态都为unknown无法解析
经过测试: 重启服务后,采集的一段时间内,请求结果是成功了,之后又全部unknown了
另外: 采用开启uprobe后,请求的状态获取正常,但是uprobe配合extra-log-fields自定义头采集,开启后短暂时间内可以采集到自定义头部,明显unknown大量减少,但是过了30秒后自定义头部就采集不到了,我们尝试在agent rust源代码处理的位置打了log,重新编译agent部署测试,过了大概30秒后,发现ebpf内核采集的数据到了rust用户态处理的阶段自定义header数据就没了,这个会有另一个issue去记录。
@sharang @yinjiping
How to reproduce
No response
DeepFlow version
kubectl exec -it -n deepflow deploy/deepflow-server -- deepflow-server -v
Name: deepflow-server community edition
Branch: v6.5
CommitID: 9cefc731a5577fdbf67ec1196cef037b28abbe88
RevCount: 10866
Compiler: go version go1.21.13 linux/amd64
CompileTime: 2024-08-30 09:40:53
kubectl exec -it -n deepflow ds/deepflow-agent -- deepflow-agent -v
Defaulted container "deepflow-agent" out of: deepflow-agent, configure-sysctl (init)
10841-e0a10484155463453b61b20bd5fcd6222d59f829
Name: deepflow-agent community edition
Branch: v6.5
CommitId: e0a10484155463453b61b20bd5fcd6222d59f829
RevCount: 10841
Compiler: rustc 1.77.1 (7cf61ebde 2024-03-27)
CompileTime: 2024-08-18 14:17:18
DeepFlow agent list
No response
Kubernetes CNI
No response
Operation-System/Kernel version
4.18.0-372.32.1.90.po1.x86_64
Anything else
No response
Are you willing to submit a PR?
- [X] Yes I am willing to submit a PR!
Code of Conduct
- [X] I agree to follow this project's Code of Conduct
@Fancyki1 对于http2/grpc等长流 Agent 流聚合组件有个超时的机制,流长时间没有流量agent会认为流已经结束了,会将流对应资源都删除了(避免内存太高了),包括http2的压缩表等信息,这个时候如果实际流没有结束那么后续的http2/grpc流量解析都会有问题,因为没有压缩表了grpc-status解析不出来,就会导致这个现象。这个问题还没有太好的方法,目前看用uprobe的方式采集http2/grpc更好一些