Is your feature request related to a problem? OpenSearch users want an easy way to enrich the data they have stored in OpenSearch and external data sources using content from an OpenSearch index. This is common in security analytics scenarios where one wants to enrich their IP reputation lists, vulnerability databases, or threat feeds.

What solution would you like? Do a lookup of a field/value, from another log group and use that to convert to user friendly name/error code.

The lookup feature should support the OpenSearch indexes as data sources for lookup tables
Users should be able to perform a static lookup using the OpenSearch index and external data sources based on the Spark integration
Users should be able to perform a static lookup using a user generated CSV such as an org unit mapping or GeoIP from MaxMind and OpenSearch index
Users should be able to perform a static lookup using a user generated CSV such as an org unit mapping or GeoIP from MaxMind and an external data source
Admins can use Index State Management to control how long the reference lookup index is available
Include helpful error messages

*** Out of Scope ***

Defining new lookup data sources beyond listed types
Automatic field mapping between events and lookups

What alternatives have you considered? Performing joins using SQL

Do you have any additional context? None.

May 03 '24 17:05 brijos

Catch All Triage - 1 2 3 4 5 6

Jun 24 '24 16:06 dblock

PPL Lookup Design Proposal

As implemented in PR 2698 the proposed design (and so far implemented) syntax is:

Design

The lookup command can be implemented as a simple search for documents in the lookup index. For every row in a search result, a search with the given match fields are performed. If a single document is found, the fields and values of the lookup document are copied to the current row of the search result. If no document is found a no-op is performed. If multiple documents are found, an error is thrown. The implementation is mainly done in core/src/main/java/org/opensearch/sql/analysis/Analyzer.java and opensearch/src/main/java/org/opensearch/sql/opensearch/storage/OpenSearchIndex.java. The "lookup search" is performed as a term and a match query. So both cases are catched: When a field is not analyzed or analyzed with respect to it mapping.

The Spark PPL Lookup command implementation is done in separate PR in the opensearch-spark repo: PR 407. Here we can (and need) to implement it as a join.

Syntax

lookup <lookup index> <lookup field> [AS <local lookup field>] [<lookup field> [AS <local lookup field>]]… [appendonly=true|false] [<source field> [AS <local field>]]...

lookup is the name of the lookup operation and it is supposed to be changed to something else. Normally we would use lookup but this seems to already used otherwise in the AST.

<lookup index> is the name of the lookup index (mandatory).

Then we need at least one <lookup field> which is a field in the lookup index used to match to a local field (in the current search) to get the lookup document. When there is no lookup document we just do nothing, if there is more than one we fail with an error.

If more than one <lookup field> is provided, all of them must match (we do a term and a match query for the field value as of now)

If the field has a different name in the current search result use <local lookup field> to map it.

appendonly is false by default abnd inidicates if the values we copy over to the search result from the lookup documemnt should overwrite existing values. If appendonly is true we do not overwrite existing values.

<source field> are the fields that should be copied. If no such fields are given all fields are copied. If the field should have a different name than in the lookup document use [AS <local field>

Examples:

{"query":"source=logins | lookup users uid AS id appendonly=true"}

{"query":"source=logins | lookup users uid,name phone,department AS thedepartment"}

Jul 16 '24 11:07 salyh

+1 for this feature

Sep 05 '24 12:09 Gokul-Radhakrishnan

PPL Lookup Command Design

As implemented in https://github.com/opensearch-project/opensearch-spark/pull/686 the proposed design doc:

Overview

Lookup command enriches your search data by adding or replacing data from a lookup index (dimension table). You can extend fields of an index with values from a dimension table, append or replace values when lookup condition is matched. As an alternative of Join command, lookup command is more suitable for enriching the source data with a static dataset.

Syntax of Lookup Command

SEARCH source=<sourceIndex>
| <other piped command>
| LOOKUP <lookupIndex> (<lookupMappingField> [AS <sourceMappingField>])...
    [(REPLACE | APPEND) (<inputField> [AS <outputField>])...]
| <other piped command>

lookupIndex

Required
Description: the name of lookup index (dimension table)

lookupMappingField

Required
Description: A mapping key in <lookupIndex>, analogy to a join key from right table. You can specify multiple <lookupMappingField> with comma-delimited.

sourceMappingField

Optional
Default: <lookupMappingField>
Description: A mapping key from source query, analogy to a join key from left side. If you don't specify any <sourceMappingField>, its default value is <lookupMappingField>.

inputField

Optional
Default: All fields of <lookupIndex> where matched values are applied to result output if no field is specified.
Description: A field in <lookupIndex> where matched values are applied to result output. You can specify multiple <inputField> with comma-delimited. If you don't specify any <inputField>, all fields of <lookupIndex> where matched values are applied to result output.

outputField

Optional
Default: <inputField>
Description: A field of output. You can specify multiple <outputField>. If you specify <outputField> with an existing field name in source query, its values will be replaced or appended by matched values from <inputField>. If the field specified in <outputField> is a new field, an extended new field will be applied to the results.

REPLACE | APPEND

Optional
Default: REPLACE
Description: If you specify REPLACE, matched values in <lookupIndex> field overwrite the values in result. If you specify APPEND, matched values in <lookupIndex> field only append to the missing values in result.

Usage

source = table1 | lookup table2 id
source = table1 | lookup table2 id, name
source = table1 | lookup table2 id as cid, name
source = table1 | lookup table2 id as cid, name replace dept as department
source = table1 | lookup table2 id as cid, name replace dept as department, city as location
source = table1 | lookup table2 id as cid, name append dept as department
source = table1 | lookup table2 id as cid, name append dept as department, city as location

Examples

SEARCH source=<sourceIndex>
| WHERE orderType = 'Cancelled'
| LOOKUP account_list, mkt_id AS mkt_code REPLACE amount, account_name AS name
| STATS count(mkt_code), avg(amount) BY name

SEARCH source=<sourceIndex>
| DEDUP market_id
| EVAL category=replace(category, "-", ".")
| EVAL category=ltrim(category, "dvp.")
| LOOKUP bounce_category category AS category APPEND classification

SEARCH source=<sourceIndex>
| LOOKUP bounce_category category

Sep 24 '24 14:09 LantaoJin

@YANG-DB guess we can close this one because it seems done in https://github.com/opensearch-project/opensearch-spark/pull/686 ?

Oct 08 '24 14:10 salyh

@YANG-DB guess we can close this one because it seems done in opensearch-project/opensearch-spark#686 ?

We are keeping this open since this issue refers to the PPL OpenSearch engine

Oct 08 '24 17:10 YANG-DB

sql
sql copied to clipboard

[FEATURE] PPL LOOKUP Functionality

PPL Lookup Design Proposal

Design

Syntax

PPL Lookup Command Design

Overview

Syntax of Lookup Command

Usage

Examples

sql sql copied to clipboard

[FEATURE] PPL LOOKUP Functionality

PPL Lookup Design Proposal

Design

Syntax

PPL Lookup Command Design

Overview

Syntax of Lookup Command

Usage

Examples

sql
sql copied to clipboard