logstash-filter-translate icon indicating copy to clipboard operation
logstash-filter-translate copied to clipboard

Enhance multi-field lookup enrichment

Open acchen97 opened this issue 8 years ago • 9 comments

Translate supports JSON, CSV, and YAML file lookups. Each of these formats support some type of multi-field lookup; for JSON and YAML its heirarchical, and in CSV, a lookup on a key could reference multiple values in the row.

Currently, these lookups are possible, but will result in a complex object in the "destination" or self defined field. We should allow for these multi-field lookups to just add new top level fields for enriching the event.

acchen97 avatar Mar 18 '17 04:03 acchen97

As a workaround, or perhaps solution, you can achieve, today, what you describe by using multiple translate filters.

On Fri, Mar 17, 2017 at 9:42 PM Alvin Chen [email protected] wrote:

Translate supports JSON, CSV, and YAML file lookups. Each of these formats support some type of multi-field lookup; for JSON and YAML its heirarchical, and in CSV, a lookup on a key could reference multiple values in the row.

Currently, these lookups are possible, but will result in a complex object in the "destination" or self defined field. We should allow for these multi-field lookups to just add new top level fields for enriching the event.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/logstash-plugins/logstash-filter-translate/issues/44, or mute the thread https://github.com/notifications/unsubscribe-auth/AAIC6hcJInFRO2c7fK_KeRT9nSbvaSMxks5rm2CqgaJpZM4MhSuZ .

jordansissel avatar Mar 18 '17 16:03 jordansissel

I agree. Basically, what I need is this -

{ translate { dictionary_path => '/some/field/path/to/lookup/as/reference (JSON|YAML|CSV)' fields => ['event_field_1', event_field_2] destination => ['new_event_field_1_replaced', 'new_event_field_2_replaced'] } }

The objective is to use the same reference file to replace multiple fields with values.

If I use multiple translate filters it will re-load the same file multiple times(?) Kindly confirm.

shreyasrk avatar May 19 '17 09:05 shreyasrk

+1

Chandanvatsa avatar Jan 06 '18 18:01 Chandanvatsa

If I use multiple translate filters it will re-load the same file multiple times(?) Kindly confirm.

Yes. What is your concern with this?

jordansissel avatar Jan 08 '18 08:01 jordansissel

+1 I need this.

coregear avatar Jan 30 '18 03:01 coregear

This would be a nice new feature for data enrichment!

For example, for username data enrichment using a CSV/JSON file, you would be able to add full name, department, office, etc, at the same time with just one call to translate filter.

alesnav avatar Apr 25 '18 22:04 alesnav

It seems like the requested feature links multiple source fields to destination fields. It would be tricky to validate 1 to 1 mapping of field array elements to destination array elements. We could consider a new setting mapping (hash).

  translate {
    mapping {
      [f1] => [d1]
      [f2] => [d2]
    }
    ...
  }

This, however, would mean that the dictionary holds keys and values from multiple domains. I would argue that separate translate filters per domain is a cleaner approach.

On the other hand I can see scenarios where an event has several field values in the same domain, e.g. src_ip/dest_ip or from_id/to_id.

guyboertje avatar Jul 19 '18 01:07 guyboertje

As regards the original proposal of having multi-valued translations added to the root of an event, the problem lies with the fallback setting. It is a string.

The question is how to accommodate a multi-field lookup value with a string fallback. Should there be a no match fallback substitution then there will be an ES mapping conflict.

My advice would be to use a CSV dictionary followed by a Dissect filter. The lookup value and fallback value should have the same structure then one can apply the Dissect filter regardless of match or no match.

guyboertje avatar Jul 19 '18 01:07 guyboertje

I have created a PR #67 that adds support for iterate_on, a new setting that handles fields with an array of values (strings).

With this one can achieve multiple field translations. First build an field with array values , say, ips by using add_field => { "[ips][0]" => "%{src_ip}" "[ips][1]" => "%{dest_ip}" } then iterate_on ips, you will have a translated array. Then add_field again. add_field => { "[src_name]" => "%{[translated][0]}" "[dest_name]" => "%{[translated][1]}" }

guyboertje avatar Jul 23 '18 21:07 guyboertje