deepflow [FR] Enhance self-learning collection of http2/gRPC header key values

Search before asking

[X] I had searched in the issues and found no similar feature requirement.

Description

需求： deepflow v6.4版本实现了eBPF kprobe 高性能解码 HTTP2 压缩头，自动学习通信双方的压缩字典，但是在实际过程中采集自定义header存在丢失乱序覆盖的问题，希望使用只采集value去解决自定义头匹配的问题文章来源：https://www.deepflow.io/blog/zh/053-high-performance-decoding-of-http2-compressed-headers-using-ebpf-kprobe/ 缺陷：

对于 deepflow-agent 启动之前就已经存在的 HTTP2 长连接，已存在的动态字典表项无法解码
使用 cBPF 时，由于网络中可能存在丢包、重传、乱序等因素，因此对压缩头不的还原可能存在误差（但 eBPF kprobe 无此限制）
实际测试v6.5版本可能存在压缩字典乱序的问题，导致采集内容key和value对应不上

问题描述： 对于可能存在压缩字典乱序的问题，导致采集内容key和value对应不上，实测效果

static_config:
   l7-protocol-advanced-features:
     extra-log-fields:
       http2:
       -  field-name: "x-custom-code"
       -  field-name: "x-custom-msg"
       -  field-name: "x-custom-data"

发送一个http2/gRPC的请求

:authority: www.xxxx.com
:method: POST
:path: /list?aid=6383&sdk_version=5.1.18_zip&device_platform=web&zip=1
:scheme: https
accept: */*
accept-encoding: gzip, deflate, br, zstd
accept-language: zh-CN,zh;q=0.9,en;q=0.8,en-GB;q=0.7,en-US;q=0.6
content-encoding: gzip
content-length: 5368
content-type: application/json; charset=utf-8
origin: https://www.xxxx.com
priority: u=1, i
referer: https://www.xxxx.com/
user-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Safari/537.36 Edg/129.0.0.0
x-custom-code: 200
x-custom-msg: success
x-custom-data: {"test": "data"}

技术原理：https://kiosk007.top/post/http-2-0-header-compression/ http2索引表包括：静态表rfc7541和动态表

Server代码落库位置：

deepflow\server\ingester\flow_log\log_data\l7_flow_log.go

// AttributeNames = [] 数组 和 AttributeValues = [] 数组
// 映射关系是一对一 key=>value关系：AttributeNames[i]=>AttributeValues[i]
h.AttributeNames = append(h.AttributeNames, l.ExtInfo.AttributeNames...)
h.AttributeValues = append(h.AttributeValues, l.ExtInfo.AttributeValues...)
h.MetricsNames = append(h.MetricsNames, l.ExtInfo.MetricsNames...)
h.MetricsValues = append(h.MetricsValues, l.ExtInfo.MetricsValues...)

落库结果举例：

# 情况1：正常，少数
AttributeNames = ["rpc_services","x-custom-code","x-custom-msg","x-custom-data"]
AttributeValues = ["xxx","200","success","{\"test\": \"data\"}"]
# 情况2：异常，大量
# x-custome-msg 被 x-custome-code 覆盖，索引表解析乱序
AttributeNames = ["rpc_services","x-custom-code","x-custom-code","x-custom-data"]
AttributeValues = ["xxx","200","success","{\"test\": \"data\"}"]
# x-custome-code 被 x-custome-data 覆盖，索引表解析乱序
AttributeNames = ["rpc_services","x-custom-data","x-custom-msg","x-custom-data"]
AttributeValues = ["xxx","200","success","{\"test\": \"data\"}"]

技术方案： 技术思路：既然自学习HTTP2头解析索引表还是存在一些不足，不如从有特点的value入手通过配置进行补全

首先来一个通用简单的场景，分隔符处理，定义一个header

# 定义的header key ：x-custom-content，没有实际意义，如果wireshark和deepflow学习不到这个值的时候是unknown
# 特定字符串分隔符：!#!
x-custom-content: "200!#!success!#!{\"test\": \"data\"}"
# 实际协议解析可能为：unknown："200!#!success!#!{\"test\": \"data\"}"

增加一个配置：这里有几个不同的方案，经过实测后

static_config:
   l7-protocol-advanced-features:
     extra-log-fields:
       http2:
       -  field-name: "x-custom-code"
          match-value-rule: "!#!"
          field-value-index: 0
       -  field-name: "x-custom-msg"
          match-value-rule: "!#!"
          field-value-index: 1
       -  field-name: "x-custom-data"
          match-value-rule: "!#!"
          field-value-index: 2

由于特殊分隔符的情况较少，解析header时候可以被特殊分隔符分割且分割后的长度大于等于2的value，按照匹配规则和预定义的key进行补全。

补全后的结果和正常自学习header结果一致，效果稳定

AttributeNames = ["rpc_services","x-custom-code","x-custom-msg","x-custom-data"]
AttributeValues = ["xxx","200","success","{\"test\": \"data\"}"]

场景补充：正则匹配处理（字段冗余思路）

# 定义的header key ：x-custom-content，http2协议标准，动态表的一个字段，解析没有实际意义
# 特定字符串分隔符：!#!
x-custom-code: "x-custom-code:200"
x-custom-msg: "x-custom-msg:success"
x-custom-data: "x-custom-data:{\"test\": \"data\"}"
# 实际协议解析可能为：
# unknown: "x-custom-code:200"
# unknown: "x-custom-msg:success"
# unknown: "x-custom-data:{\"test\": \"data\"}"

增加一个配置

static_config:
   l7-protocol-advanced-features:
     extra-log-fields:
       http2:
       -  field-name: "x-custom-code"
          match-value-rule: "^x-custom-code:(.*)"
          field-value-index: 0
       -  field-name: "x-custom-msg"
          match-value-rule: "^x-custom-msg:(.*)"
          field-value-index: 0
       -  field-name: "x-custom-data"
          match-value-rule: "^x-custom-data:(.*)"
          field-value-index: 0

举例伪代码处理：

import re

input_string = "x-custom-msg:success"
pattern = r"^x-custom-msg:(.*)"

match = re.match(pattern, input_string)

if match:
    result = match.group(1)
    print("匹配成功!")
    print("提取的内容:", result) # success
else:
    print("匹配失败")

匹配解析后的结果

# x-custom-code: "200"
# x-custom-msg: "success"
# x-custom-data: "{\"test\": \"data\"}"
AttributeNames = ["rpc_services","x-custom-code","x-custom-msg","x-custom-data"]
AttributeValues = ["xxx","200","success","{\"test\": \"data\"}"]

备注： 采用HTTP2静态表中的字段user-agent和server，deepflow采集的效果稳定很多，但是对应的server代码要做修改处理，静态表字段并不符合协议标准和存在不安全性，看能否兼容动态表处理，兼容自定义http2 header的场景 @sharang

Use case

No response

Related issues

No response

Are you willing to submit a PR?

[x] Yes I am willing to submit a PR!

Code of Conduct

[X] I agree to follow this project's Code of Conduct

Sep 28 '24 15:09 Fancyki1

@Fancyki1 你提到的方法挺好的，相当于定义一个 http/grpc header injection 的规范，通过 value 的特殊性，在一个 value 中放进去所有需要 injection 的内容。

我们想想如何能在规范层面推进这种做法。

Oct 15 '24 00:10 sharang

请问一下低于6.4的版本会有这个问题吗？

Nov 11 '24 07:11 gbling

@gbling 文章来源都有：https://www.deepflow.io/blog/zh/053-high-performance-decoding-of-http2-compressed-headers-using-ebpf-kprobe/ 6.4之前都不支持这个功能

Nov 12 '24 02:11 Fancyki1

@Fancyki1 想再确认一下，HTTP1.1 协议的也会有同样的情况么？

Nov 13 '24 07:11 gbling

@gbling http1.1 可以用wasm插件解析去实现，不需要用到这个特性

Nov 13 '24 08:11 Fancyki1

@Fancyki1 是这样的，我们在测试链路追踪的时候通过自定义的 http_log_x_request_id 做链路的关联，内部链路调用都是用 http1.1 ，会存在链路不全的情况；是想再明确一下这个特性是只对 HTTP2/gRPC 生效，还是 http1.1 也会生效的？

Nov 13 '24 08:11 gbling

@gbling 你多看看文档，文档里面都写了

    ## Configuration to extract the customized header fields of HTTP, HTTP2, GRPC protocol etc
    #extra-log-fields:
    ## for example:
    ## http:
    ## - field-name: "user-agent"
    ## - field-name: "cookie"
    #  http: []
    #  http2: []

你用>v6.4版本，配置了http就启用了http1.1，而且http1.1不存在http2索引表的采集乱序不全的问题，直接用就好了，而且你要弄明白你要实现什么效果，如果是链路追踪那和这个没什么关系，如果想用这个看链路追踪是否每个请求都有http_log_x_request_id那倒是可以辅助排障使用

Nov 13 '24 08:11 Fancyki1