openobserve
openobserve copied to clipboard
Limit the depth of parsing json
Which OpenObserve functionalities are relevant/related to the feature request?
log search
Description
Thank you for developing such a great product. Thank you Coldstar for your patience and help.
There are many structures in our log content. We use zap to print the log, and the entire log will be output as json. These structures experience a lot, and different structures have the same fields, and the depth is generally 3 to 4 layers. The data example is as follows:
./gw.log.20230712:21080:{"level":"debug","ts":"2023-07-12 10:00:15","caller":"iotmqtt/mqtt_sl651.go:24","msg":"","service":"gw","body":{"station_sn":"0061997204","category":"48H","fn":52,"fn34":{"rain_sample":[0,0,0,0,0,0,0,0,0,0,0,0],"rain_sample_timestamp":1689127200,"rain_accumu_timestamp":1689127200,"water_sample":[655.35,655.35,655.35,655.35,655.35,655.35,655.35,655.35,655.35,655.35,655.35,655.35],"water_sample_timestamp":1689127200,"volage_sample":11.96,"volage_sample_timestamp":1689127200}}} ./gw.log.20230712:21082:{"level":"debug","ts":"2023-07-12 10:00:15","caller":"iotmqtt/mqtt_sl651.go:226","msg":"handleFN34","service":"gw","stationSN":"0061997204","data":{"rain_sample":[0,0,0,0,0,0,0,0,0,0,0,0],"rain_sample_timestamp":1689127200,"rain_accumu_timestamp":1689127200,"water_sample":[655.35,655.35,655.35,655.35,655.35,655.35,655.35,655.35,655.35,655.35,655.35,655.35],"water_sample_timestamp":1689127200,"volage_sample":11.96,"volage_sample_timestamp":1689127200}} ./gw.log.20230712:21105:{"level":"debug","ts":"2023-07-12 10:00:49","caller":"iotmqtt/mqtt_sl651.go:24","msg":"","service":"gw","body":{"station_sn":"0012340091","category":"48H","fn":52,"fn34":{"rain_sample":[0.1,0.2,0,0,0.1,0,0.1,0,0,0,0.2,0.1],"rain_sample_timestamp":1689127200,"rain_accumu":128,"rain_accumu_timestamp":1689127200,"water_sample":[256.82,256.04,256.71,256.34,257.85,256,256.87,257.94,257.83,256.1,256.69,257.5],"water_sample_timestamp":1689127200,"volage_sample":13.6,"volage_sample_timestamp":1689127200}}} ./gw.log.20230712:21107:{"level":"debug","ts":"2023-07-12 10:00:49","caller":"iotmqtt/mqtt_sl651.go:226","msg":"handleFN34","service":"gw","stationSN":"0012340091","data":{"rain_sample":[0.1,0.2,0,0,0.1,0,0.1,0,0,0,0.2,0.1],"rain_sample_timestamp":1689127200,"rain_accumu":128,"rain_accumu_timestamp":1689127200,"water_sample":[256.82,256.04,256.71,256.34,257.85,256,256.87,257.94,257.83,256.1,256.69,257.5],"water_sample_timestamp":1689127200,"volage_sample":13.6,"volage_sample_timestamp":1689127200}} ./gw.log.20230712:21124:{"level":"debug","ts":"2023-07-12 10:00:58","caller":"iotmqtt/mqtt_sl651.go:24","msg":"","service":"gw","body":{"station_sn":"0012340091","category":"48H","fn":52,"fn34":{"rain_sample":[0,0.2,0.1,0,0,0.2,0,0,0.2,0,0.2,0.1],"rain_sample_timestamp":1689127200,"rain_accumu":128,"rain_accumu_timestamp":1689127200,"water_sample":[256.09,258.31,256.33,257.64,258.27,256.95,257.43,256.84,258.49,256.89,256.12,257.33],"water_sample_timestamp":1689127200,"volage_sample":13.09,"volage_sample_timestamp":1689127200}}} ./gw.log.20230712:21126:{"level":"debug","ts":"2023-07-12 10:00:58","caller":"iotmqtt/mqtt_sl651.go:226","msg":"handleFN34","service":"gw","stationSN":"0012340091","data":{"rain_sample":[0,0.2,0.1,0,0,0.2,0,0,0.2,0,0.2,0.1],"rain_sample_timestamp":1689127200,"rain_accumu":128,"rain_accumu_timestamp":1689127200,"water_sample":[256.09,258.31,256.33,257.64,258.27,256.95,257.43,256.84,258.49,256.89,256.12,257.33],"water_sample_timestamp":1689127200,"volage_sample":13.09,"volage_sample_timestamp":1689127200}} ./gw.log.20230712:21145:{"level":"debug","ts":"2023-07-12 10:01:44","caller":"iotmqtt/mqtt_sl651.go:24","msg":"","service":"gw","body":{"station_sn":"0061997329","category":"48H","fn":52,"fn34":{"rain_sample":[0,0,0,0,0,0,0,0,0,0,0,0],"rain_sample_timestamp":1689127200,"rain_accumu_timestamp":1689127200,"water_sample":[655.35,655.35,655.35,655.35,655.35,655.35,655.35,655.35,655.35,655.35,655.35,655.35],"water_sample_timestamp":1689127200,"volage_sample":12.12,"volage_sample_timestamp":1689127200}}} ./gw.log.20230712:21147:{"level":"debug","ts":"2023-07-12 10:01:44","caller":"iotmqtt/mqtt_sl651.go:226","msg":"handleFN34","service":"gw","stationSN":"0061997329","data":{"rain_sample":[0,0,0,0,0,0,0,0,0,0,0,0],"rain_sample_timestamp":1689127200,"rain_accumu_timestamp":1689127200,"water_sample":[655.35,655.35,655.35,655.35,655.35,655.35,655.35,655.35,655.35,655.35,655.35,655.35],"water_sample_timestamp":1689127200,"volage_sample":12.12,"volage_sample_timestamp":1689127200}}
When openobserve is processing, it will unpack all the json and generate a lot of indexes, and the whole search speed is very slow.
Proposed solution
It is recommended to have a parameter to limit the depth of parsing json. Either parse all, or only the first level.
Alternatives considered
For the time being, we can only serialize the log first, so zinc will recognize it as a string instead of json
sample data:
{
"station_sn":"0061997204","category":"48H","fn":52,
"fn34":{
"rain_sample":[0,0,0,0,0,0,0,0,0,0,0,0],
"rain_sample_timestamp":1689127200,
"rain_accumu_timestamp":1689127200,
"water_sample":[655.35,655.35,655.35,655.35,655.35,655.35,655.35,655.35,655.35,655.35,655.35,655.35],
"water_sample_timestamp":1689127200,
"volage_sample":11.96,
"volage_sample_timestamp":1689127200
}
}
Can be handled using - https://github.com/openobserve/openobserve/issues/1081
We already made some changes for this issue, now we only flatten object, and skip array. the result like this:
Oraginal data:
{
"station_sn":"0061997204","category":"48H","fn":52,
"fn34":{
"rain_sample":[0,0,0,0,0,0,0,0,0,0,0,0],
"rain_sample_timestamp":1689127200,
"rain_accumu_timestamp":1689127200,
"water_sample":[655.35,655.35,655.35,655.35,655.35,655.35,655.35,655.35,655.35,655.35,655.35,655.35],
"water_sample_timestamp":1689127200,
"volage_sample":11.96,
"volage_sample_timestamp":1689127200
}
}
Ingested data in OpenObserve:
{
"_timestamp": 1691129227148152,
"category": "48H",
"fn": 52,
"fn34_rain_accumu_timestamp": 1689127200,
"fn34_rain_sample": "[0,0,0,0,0,0,0,0,0,0,0,0]",
"fn34_rain_sample_timestamp": 1689127200,
"fn34_volage_sample": 11.96,
"fn34_volage_sample_timestamp": 1689127200,
"fn34_water_sample": "[655.35,655.35,655.35,655.35,655.35,655.35,655.35,655.35,655.35,655.35,655.35,655.35]",
"fn34_water_sample_timestamp": 1689127200,
"station_sn": "0061997204"
}
The detail #1323
Wow such a great new @hengfeiyang ! 🥳
Can we use DataFusion SQL Scalar function on these new array ? Like array_has ?
@IOITI in the backend, the array will store as a string. for example:
record:
{"field1":"I am a message", "field2": [1,2,3,4]}
in storage, will have two fields:
- field1:
i am a message - field2:
"[1,2,3,4]", just a string value. can't be search as a array, but can search as a string.