x-pack/metricbeat/module/sql: Problem with handling types properly
There was a bug report indicating that although the source had correct values, after ingestion using metricbeat's SQL module, the types were mishandled, causing issues.
For example, the source looks like:
"m": {
"schema": "APP",
"country": "EU",
"orig_company": 9999,
"error_msg": "Purchase Order 5504174335 has been closed.",
"tran_log_id": 18459944,
"criticality": 3,
"orig_facility": 1043316,
"dest_facility": "BD07",
"object_id": 5504174335,
"orig_process_date_time": "30.05.2024 22:30",
"sequence_number": 18462360,
"except_dtl": "",
"application": "EEM",
"incoming_object_status": "2 Object exists with no error",
"error_type": 2,
"foo_int": 1234567890,
"msg_type": "PurchaseOrder",
"error_code": 1530014,
"prod_cluster": "WMPEEMAPPPEU",
"foo_stegy": "1.23456789E9",
"foo_string": "1.23456789E9",
"direction": "I",
"timestamp": "2024-05-30T22:30:00.000Z"
},
But, the fields section in Kibana represented the object_id as a floating point number.
"m.object_id": [
"5.504174335E9"
],
The field is defined as keyword in the mappings (copied from sql.metrics.numeric.object_id):
"object_id": {
"type": "keyword"
},
Not just this,
For configuration like this:
- module: sql
metricsets:
- query
period: 10s
hosts: ["user=<REMOVED> password=<REMOVED> dbname=<REMOVED> sslmode=disable"]
driver: "postgres"
sql_query: "select '0054321'::varchar(20) as strCol, 12345 as intCol"
sql_response_format: table
where 2 columns are expected i.e., the first column with value 0054321 (of VARCHAR type) and the second column with 12345 as a number.
But, after running metricbeat, see the following:
Query:
$ curl -X POST "localhost:9200/metricbeat-8.14.1/_search?pretty" -H 'Content-Type: application/json' -d'
{
"query": {
"match": {
"metricset.name": "query"
}
}
}
Response:
"sql" : {
"driver" : "postgres",
"query" : "select '0054321'::varchar(20) as strCol, 12345 as intCol",
"metrics" : {
"numeric" : {
"strcol" : 54321,
"intcol" : 12345
}
}
},
which is incorrect i.e., leading zeroes of VARCHAR are lost. To retain the leading zeroes in strcol should've been ideally string.
Root cause:
The root problem exists here. If you see []byte and default cases, it handles them as a string and then tries to parse it as float. If it can, it becomes a float else it remains a string.
For better understanding, you can also take a look at the unit tests (test inputs) here.
If you notice in the unit tests, case int, unit, etc. is handled and float64 is expected and not int, unit which is not correct. Similarly more problems exist.
To fix this behavior, change is necessitated to handle these types properly. Also, add cases like:
s := fmt.Sprint(val)
if len(s) > 1 && s[0] == '0' && s[1] != '.' {
// Preserve string with leading zeros
return s
}
i.e., any string leading with 0 (not immediately followed by a dot) should be a string. This should handle types like VARCHAR, TEXT, etc.
Also, types like int, uint, etc. should remain as they are not converted to float.
And then, we have to update the code. Here all those types should be handled properly. For example, numeric types in ES should have those types and not just float64 as it has support for more.
Related internal issue: https://github.com/elastic/sdh-beats/issues/4860