observability icon indicating copy to clipboard operation
observability copied to clipboard

[FEATURE] Add dynamic Observability Log Category based template creation

Open YANG-DB opened this issue 2 years ago • 0 comments

Is your feature request related to a problem? Logs serve as a primary source of information for understanding and debugging complex systems. Logs provide a record of events, errors, and performance that help developers and administrators identify and resolve issues, monitor system behavior, and improve reliability.

Logs can also be used to gain insight into user behavior and facilitate auditing and compliance. By analyzing logs, one can detect patterns, anomalies, and correlations that can inform decisions and facilitate problem-solving. Logs help to ensure the visibility, reliability, and stability of systems.

Log structuring in Observability is not a simple task:

Variety of Log Types: Logs come in many different types, and it can be challenging to create a schema that can accommodate all of them. Different types of logs may have different data structures, fields, and formats, which can make it challenging to develop a schema that can accommodate them all.

Data Volume: Observability logs can generate a vast amount of data, and the volume of data can grow exponentially as applications scale. Trying to structure all this data into a well-defined schema can be challenging, especially if the schema needs to be flexible enough to accommodate new log types as they arise.

Changing Data: Data within logs can change over time. For example, new fields might be added, or existing fields might be modified or removed. Creating a schema that can keep up with these changes can be difficult, especially when dealing with a large volume of data.

Unstructured Data: In some cases, observability logs may contain unstructured data, which is challenging to structure into a well-defined schema. Unstructured data can include free-form text or binary data, which may not fit well into a predefined schema.

Contextual Data: Observability logs often contain contextual data, which can be challenging to fit into a schema. For example, logs may include metadata about the application, the environment, or the user, which can be difficult to structure in a meaningful way.

We need an effective way to add structure for logs and make this approach incremental to accommodate the above concerns .

What solution would you like? According to ECS events category we can distinguish logs into different groups.

For example the next categories:

web: This category is for events related to web traffic. A concrete example for this category is an Nginx web server access log, which might have a log message like this:

{"@timestamp": "2023-02-17T12:34:56.789Z", "event": {"category": "web"}, "url": "https://example.com/index.html", "status_code": 200, "client_ip": "1.2.3.4", "user_agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36"}

file: This category is for events related to file activity. A concrete example for this category is a log from a file integrity monitoring tool, which might have a log message like this:

{"@timestamp": "2023-02-17T12:34:56.789Z", "event": {"category": "file"}, "file_path": "/etc/passwd", "file_hash": "sha256:abcd1234", "action": "modified", "user": "root"}

network: This category is for events related to network activity. A concrete example for this category is a firewall log, which might have a log message like this:

{"@timestamp": "2023-02-17T12:34:56.789Z", "event": {"category": "network"}, "src_ip": "1.2.3.4", "dst_ip": "5.6.7.8", "src_port": 12345, "dst_port": 80, "protocol": "TCP"}

process: This category is for events related to process activity. A concrete example for this category is a process monitoring log, which might have a log message like this:

{"@timestamp": "2023-02-17T12:34:56.789Z", "event": {"category": "process"}, "process_name": "ssh", "pid": 1234, "parent_pid": 5678, "user": "alice"}

This feature would allow specifying a category for a log integration:

name: Nginx
categories: [web]

The category will infer a specific index template component that will be for that specific resource:

The main log index template is the container for these categories and only includes the metadata fields common for all log types: log index template

This log template contains a section which describes the components its composed of:

  "composed_of": [
    "http_template",
    "communication_template"
  ],

This componentization allows the composition of different categories depending in the integration resource - in our case nginx has a web categorization.

This allows dynamically creating indices based on the category classification and allowing composition of different log index template which exactly and concisely describe the ingested log format.

Lets examine the next flow:

  1. NginX integration specifies the next configuration:
{
  "name": "nginx",
  "version": {
        "integ": "0.1.0",
        "schema": "1.0.0",
        "resource": "^1.23.0",
   }
  "description": "Nginx HTTP server collector",
  "identification": "instrumentationScope.attributes.identification",
  "categories": [
    "web",
  ],
  "collection":[
    {
       "logs": [{
                    "info": "access logs",
                    "input_type":"logfile",
                    "dataset":"nginx.access",
                    "namespace":"prod",
                    "labels" :["nginx","access"],
                    "schema": "./schema/logs/access.json"
                },
                {
                    "info": "error logs",
                    "input_type":"logfile",
                    "labels" :["nginx","error"],
                    "dataset":"nginx.error",
                    "namespace":"prod",
                    "schema": "./schema/logs/error.json"
                }]
    },
    {
        "metrics": [{
                    "info": "status metrics",
                    "input_type":"metrics",
                    "dataset":"nginx.status",
                    "labels" :["nginx","status"],
                    "schema": "./schema/metrics/status.json"
                }]
    }
  ],
  "repo": {
    "github": ".../"
  }
}

As shown before it has a web category and expects the target indices for this ingestion to be

  • sso_logs-nginx.access-prod
  • sso_logs-nginx.error-prod
  1. The integration installation process verifies these indices exist, in the case they dont - it will create the next template that is combined from the general log index template plus the specific template which are associated with the web category:
["http_template", "communication_template"]

After this template is created - it will have the following index naming: sso_logs-nginx*-*. The next indices will be created that will share the appropriate mapping.

  • sso_logs-nginx.access-prod
  • sso_logs-nginx.error-prod

This capability allows classification of an index to a specific list of categories and thus allows queries for category based filters rather than index based filters.

What alternatives have you considered? A clear and concise description of any alternative solutions or features you've considered.

Do you have any additional context?

YANG-DB avatar Feb 17 '23 23:02 YANG-DB