benchmark-wrapper icon indicating copy to clipboard operation
benchmark-wrapper copied to clipboard

Add CLI Option to Purge Empty Fields in ES

Open learnitall opened this issue 3 years ago • 0 comments

Currently we don't do any sort of checks to see if a field is non-empty before shipping off to ES. This can lead to a lot of empty fields being sent out, depending on the use case and the benchmark. For instance, here is a document from a Uperf CI test ran by ripsaw, where most of the fields are populated through environment variables:

{
    "_index" : "ripsaw-uperf-results-000002",
    "_type" : "_doc",
    "_id" : "297e5bb466870ad7f90916e68f60c440a43be6dbe0707ab6f3c4d7e30007e807",
    "_score" : 7.654378,
    "_source" : {
        "workload" : "uperf",
        "uuid" : "79690e49-479e-593a-8a51-0a1ef032de88",
        "user" : "ripsaw",
        "cluster_name" : "myk8scluster",
        "hostnetwork" : "True",
        "iteration" : 2,
        "remote_ip" : "10.0.133.30",
        "client_ips" : "10.0.173.62 10.130.0.1 ",
        "uperf_ts" : "2021-06-01T22:32:39.594000",
        "service_ip" : "False",
        "bytes" : 519864320,
        "norm_byte" : 258605056,
        "ops" : 1015360,
        "norm_ops" : 505088,
        "norm_ltcy" : 2.3778479412250935,
        "kind" : "pod",
        "client_node" : "ip-10-0-173-62.us-west-2.compute.internal",
        "server_node" : "unknown",
        "num_pairs" : "1",
        "multus_client" : "",
        "networkpolicy" : "",
        "density" : "1",
        "nodes_in_iter" : "1",
        "step_size" : "",
        "colocate" : "False",
        "density_range" : [ ],
        "node_range" : [ ],
        "pod_id" : "0",
        "test_type" : "stream",
        "protocol" : "udp",
        "message_size" : 512,
        "read_message_size" : 512,
        "num_threads" : 2,
        "duration" : 3,
        "run_id" : "NA"
    }
}

And here is a document exported from just running the command run_snafu --tool uperf --user ryan --uuid 1234 --proto tcp --remoteip localhost -w iperf.xml --resourcetype container -s 1 --verbose:

{
    "_index": "snafu-uperf-results",
    "_op_type": "create",
    "_source": {
        "test_type": "",
        "protocol": "tcp",
        "message_size": null,
        "read_message_size": null,
        "num_threads": 1,
        "duration": 31,
        "kind": "container",
        "hostnetwork": "False",
        "remote_ip": "localhost",
        "client_ips": "",
        "service_ip": "False",
        "client_node": "",
        "server_node": "",
        "num_pairs": "",
        "multus_client": "",
        "networkpolicy": "",
        "density": "",
        "nodes_in_iter": "",
        "step_size": "",
        "colocate": "",
        "density_range": "",
        "node_range": "",
        "pod_id": null,
        "uperf_ts": "2021-06-28T14:42:37.066000",
        "bytes": 48083435520,
        "norm_byte": 1505771520,
        "ops": 5869560,
        "norm_ops": 183810,
        "norm_ltcy": 6.534098878427453,
        "iteration": 1,
        "user": "ryan",
        "uuid": "1234",
        "workload": "uperf",
        "run_id": "NA"
    },
    "_id": "ae6d9dfc7083e94c569d1999c2eb2ae1dce4a77fc5c7052c128103783f9acc70",
    "run_id": "NA"
}

I think it would be cool to add in a CLI option called --no-empty-fields or something, that would remove any field from exported documents which is null or an empty string. This way teams only get the fields and the data that they care about, rather than also getting the extra fields which we use as a team.

learnitall avatar Jun 28 '21 14:06 learnitall