data-prepper
data-prepper copied to clipboard
[Idea] Log Generator Source
Is your feature request related to a problem? Please describe.
As a user of Data Prepper that wants to test Data Prepper log analytics with my logs rather quickly, it is a pain to have to configure an http
or file
source to send logs through Data Prepper.
As a developer of a new processor, it would be nice if I could test the end to end functionality of my processor without the http
or file
source.
Describe the solution you'd like A source that generates either custom or common log formats that can be used to test pipelines, new processors, or to demo new processors.
Here is an example configuration which will choose a random log from a list of logs every 5 seconds and send it through Data Prepper until 20 logs have been generated.
source:
log_generator:
interval: 5
# This could default to create an infinite number of logs
count: 20
log_type:
custom:
# default for ordered will be false, which chooses a random log from log_lines
# ordered being true will cycle through the logs in the order they appear in `log_lines`
ordered: true
log_lines:
- 'This is my test log which will get the message key added since it is not json'
- '{"log": "This is a json string which will get converted to an Event"}'
- '{"key1": "value1", "key2": "value2"}'
In addition to custom logs, the log_generator
could support pre made log types. Here is an example configuration where random logs in the apache common log format are generated. This idea can be expanded for many other types of common logs (syslog, s3, etc)
source:
log_generator:
log_type:
apache_clf:
I like this idea. There are other log formats that customer may want to experiment with. We could have one log_generator
source which is configured for different formats. Or just have have each generator implement their own Source.
Yes that is also an option. While there are only a couple obvious types of log formats to support (apache, syslog, etc), I do think the idea of putting all of them into a log_generator
source would make it a bit easier to add additional log types in the future (potentially a very large amount). It also makes the code that waits for an interval between each log more easily reusable (we don't have to create another interface GeneratorSource
to reuse this code, for example.