data-prepper icon indicating copy to clipboard operation
data-prepper copied to clipboard

[Idea] Log Generator Source

Open graytaylor0 opened this issue 3 years ago • 2 comments

Is your feature request related to a problem? Please describe. As a user of Data Prepper that wants to test Data Prepper log analytics with my logs rather quickly, it is a pain to have to configure an http or file source to send logs through Data Prepper.

As a developer of a new processor, it would be nice if I could test the end to end functionality of my processor without the http or file source.

Describe the solution you'd like A source that generates either custom or common log formats that can be used to test pipelines, new processors, or to demo new processors.

Here is an example configuration which will choose a random log from a list of logs every 5 seconds and send it through Data Prepper until 20 logs have been generated.

source:
  log_generator:
    interval: 5
    # This could default to create an infinite number of logs
    count: 20 
    log_type:
      custom:
        #  default for ordered will be false, which chooses a random log from log_lines
        # ordered being true will cycle through the logs in the order they appear in `log_lines`      
        ordered: true
        log_lines: 
           - 'This is my test log which will get the message key added since it is not json'
           - '{"log": "This is a json string which will get converted to an Event"}'
           - '{"key1": "value1", "key2": "value2"}'              

In addition to custom logs, the log_generator could support pre made log types. Here is an example configuration where random logs in the apache common log format are generated. This idea can be expanded for many other types of common logs (syslog, s3, etc)

source:
  log_generator:
     log_type:
       apache_clf:     

graytaylor0 avatar Feb 04 '22 20:02 graytaylor0

I like this idea. There are other log formats that customer may want to experiment with. We could have one log_generator source which is configured for different formats. Or just have have each generator implement their own Source.

dlvenable avatar Feb 05 '22 22:02 dlvenable

Yes that is also an option. While there are only a couple obvious types of log formats to support (apache, syslog, etc), I do think the idea of putting all of them into a log_generator source would make it a bit easier to add additional log types in the future (potentially a very large amount). It also makes the code that waits for an interval between each log more easily reusable (we don't have to create another interface GeneratorSource to reuse this code, for example.

graytaylor0 avatar Feb 07 '22 17:02 graytaylor0