logstash icon indicating copy to clipboard operation
logstash copied to clipboard

`if .. in []` doesn't match for single-element arrays

Open praseodym opened this issue 7 years ago • 13 comments

A Logstash filter such as if "a" in ["a", "b"] matches but if "a" in ["a"] does not, which does not make sense. I'd consider this to be a bug.

Example pipeline:

input { generator { count => 1 } }

filter {
  if "a" in ["a"] {
    mutate { add_tag => "1" }
  }

  if "a" in ["a", "b"] {
    mutate { add_tag => "2" }
  }
}

output { stdout {} }

Output:

{
      "@version" => "1",
          "tags" => [
        [0] "2"
    ],
          "host" => "hostname",
    "@timestamp" => 2018-08-22T20:07:25.211Z,
      "sequence" => 0,
       "message" => "Hello world!"
}

I would expect to see both "1" and "2" as tags.

praseodym avatar Aug 22 '18 20:08 praseodym

I believe that the only way to fix this is to introduce a new, unambiguous syntax for arrays.

There is currently an ambiguity in the pipeline syntax that is resolved by favouring any sequence in a conditional clause that starts and ends with square brackets as a Field Reference; the sequence if "a" in ["a"] is currently interpreted as:

"if the string a is in the values found on the event at field reference ["a"]".


I've run into this before, and every attempt I have made to find a fix that works within the existing pipeline syntax ends up breaking configs used in the real world:

  • (a) prohibit quotes in field references
    • breaks any config that legitimately uses quotes in field references
    • does not fix single-element integer-array use-case
  • (b) favour arrays to field references when parsing
    • breaks: field reference [foo] interpreted as single-element array containing the string foo
  • (c) favor non-bareword arrays to field references when parsing
    • breaks: integer array has same problem as (b); field reference [1] interpreted as a single-element array containing integer 1

As a workaround, I have seen people prepend their "haystack" array with a value known not to match the "needle" search key:

if "a" in ["___", "a"] {

yaauie avatar Aug 27 '18 20:08 yaauie

I see, at least I now understand what's going on. With the stricter field reference parser in Logstash 6.4 there could be an opportunity to prohibit quoted field references when they are not needed. Then again, as long as the field reference and array syntax are so similar, there will probably be more cases like this one.

praseodym avatar Aug 27 '18 20:08 praseodym

Appears to be the same as 5591 to me.

TheVastyDeep avatar Feb 25 '19 19:02 TheVastyDeep

This also causes :exception=>#<RuntimeError: Invalid FieldReference: []>, I assume from FileReference.java if you use a 1 item array as a "path" for the file input plugin.

Edit: Nevermind, I think this was something else, it seems to be happening again. Probably unrelated? Edit 2: Looks like it is related, I just had to clear out the queued data.

Zeal0us avatar Apr 08 '19 18:04 Zeal0us

At very least, this should be documented here. It's not unreasonable to expect in to match single element arrays, and it is very confusing and difficult to debug when it does not.

JohnLyman avatar Jul 26 '19 17:07 JohnLyman

Just wanted to let y'all know that this has wasted at least 45 minutes of my life :neutral_face:

TheNetworkIsDown avatar Sep 04 '19 14:09 TheNetworkIsDown

Chiming in, this issue has wasted a solid week of time for my team.

seadub avatar Jul 23 '20 20:07 seadub

Oh dear, waste 2 hours for me, what a shame.

lhzw avatar Nov 17 '20 08:11 lhzw

I just want to mention, for future generations coming here, that this year (2021) another rover named Perseverance landed on Mars. This mission has also a very cool tech demo of Mars helicopter.

And this issue still causes headaches to poor earthlings :-D

pin007 avatar Mar 15 '21 18:03 pin007

As a workaround, I have seen people prepend their "haystack" array with a value known not to match the "needle" search key:

if "a" in ["___", "a"] {

@yaauie The strategy I followed in:

  def generateFilterSection(configuration: Configuration): String = {
    val monitoredOrganizations = (configuration.apiMonitoringEntries.headOption ++ configuration.apiMonitoringEntries)
      .map(
        entry => s""""${entry.organizationId}""""
      )
      .mkString(",")


    val dropSelected = if (configuration.apiMonitoringEntries.isEmpty) {
      s"""
         |  drop { }
         |""".stripMargin
    } else {
      s"""
         |  if [${LogstashFields.organizationId}] not in [$monitoredOrganizations] {
         |    drop { }
         |  }
         |""".stripMargin
    }


    s"""
       |filter {$dropSelected}
       |""".stripMargin
  }

Was just to duplicate the unique element, the advantage of it is that it is logically impossible to generate a collision between the haystack and the domain of the needle.

pfcoperez avatar Nov 17 '22 22:11 pfcoperez

Still an issue

Tarasovych avatar Feb 17 '23 07:02 Tarasovych

With version V8.7.1 still present and absolutly unexpected.

andreas-v-nb avatar Jul 05 '23 11:07 andreas-v-nb

still an issue even after 6 years, what a shame!

tibyke avatar Aug 30 '24 11:08 tibyke