deepflow icon indicating copy to clipboard operation
deepflow copied to clipboard

[BUG] micro-service use gRPC-Gateway, the client send http request to grpc, the request resolve status is unknown

Open Fancyki1 opened this issue 1 year ago • 1 comments

Search before asking

  • [X] I had searched in the issues and found no similar feature requirement.

DeepFlow Component

Agent

What you expected to happen

背景: k8s环境中,有部分微服务使用了gRPC-Gateway发送http内部转grpc请求 结论: deepflow可以采集到http的请求内容,但是无法采集和解析grpc请求,响应状态为unknown 使用组件: https://github.com/grpc-ecosystem/grpc-gateway 组件原理:

image 请求拓扑:

image 代码实现: 定义一个grpc接口文件

syntax = "proto3";

option go_package=".;sys_time";

package sys_time;

import "google/api/annotations.proto";

message GetCurrSysTimeReq {

}

message GetCurrSysTimeRsp {
    message Data {
        int64 timeStamp = 1;
    }
    uint32 code = 1;
    string msg = 2;
    Data data = 3;
}

service SysTime {
    // 获取当前系统时间
    rpc GetCurrSysTime(GetCurrSysTimeReq) returns (GetCurrSysTimeRsp) {
	option (google.api.http) = {
            get: "/api/v1/currSysTime",
        };
    }
}

golang代码实现

// ...
func (sysTimeServer *SysTimeServer) GetCurrSysTime(ctx context.Context,req *sys_time.GetCurrSysTimeReq) (*sys_time.GetCurrSysTimeRsp, error) {
	currentTime := time.Now()
	milliSeconds := currentTime.UnixMilli()

	return &sys_time.GetCurrSysTimeRsp{
		Code: APISuccess.HttpCode,
		Msg:  APISuccess.Code,
		Data: &sys_time.GetCurrSysTimeRsp_Data{
			TimeStamp: milliSeconds,
		},
	}, nil
}
// ...

deepflow可观测结果 image

前端gateway->micro-service:http请求采集正常 micro-service->grpc gateway->grpc server: grpc请求采集异常 经过grpc gateway采集的grpc请求状态都为unknown无法解析

经过测试: 重启服务后,采集的一段时间内,请求结果是成功了,之后又全部unknown了

另外: 采用开启uprobe后,请求的状态获取正常,但是uprobe配合extra-log-fields自定义头采集,开启后短暂时间内可以采集到自定义头部,明显unknown大量减少,但是过了30秒后自定义头部就采集不到了,我们尝试在agent rust源代码处理的位置打了log,重新编译agent部署测试,过了大概30秒后,发现ebpf内核采集的数据到了rust用户态处理的阶段自定义header数据就没了,这个会有另一个issue去记录。

@sharang @yinjiping

How to reproduce

No response

DeepFlow version

kubectl exec -it -n deepflow deploy/deepflow-server -- deepflow-server -v

Name: deepflow-server community edition
Branch: v6.5
CommitID: 9cefc731a5577fdbf67ec1196cef037b28abbe88
RevCount: 10866
Compiler: go version go1.21.13 linux/amd64
CompileTime: 2024-08-30 09:40:53

kubectl exec -it -n deepflow ds/deepflow-agent -- deepflow-agent -v

Defaulted container "deepflow-agent" out of: deepflow-agent, configure-sysctl (init)
10841-e0a10484155463453b61b20bd5fcd6222d59f829
Name: deepflow-agent community edition
Branch: v6.5
CommitId: e0a10484155463453b61b20bd5fcd6222d59f829
RevCount: 10841
Compiler: rustc 1.77.1 (7cf61ebde 2024-03-27)
CompileTime: 2024-08-18 14:17:18

DeepFlow agent list

No response

Kubernetes CNI

No response

Operation-System/Kernel version

4.18.0-372.32.1.90.po1.x86_64

Anything else

No response

Are you willing to submit a PR?

  • [X] Yes I am willing to submit a PR!

Code of Conduct

Fancyki1 avatar Sep 19 '24 15:09 Fancyki1

@Fancyki1 对于http2/grpc等长流 Agent 流聚合组件有个超时的机制,流长时间没有流量agent会认为流已经结束了,会将流对应资源都删除了(避免内存太高了),包括http2的压缩表等信息,这个时候如果实际流没有结束那么后续的http2/grpc流量解析都会有问题,因为没有压缩表了grpc-status解析不出来,就会导致这个现象。这个问题还没有太好的方法,目前看用uprobe的方式采集http2/grpc更好一些

yuanchaoa avatar Apr 30 '25 06:04 yuanchaoa