logstash icon indicating copy to clipboard operation
logstash copied to clipboard

[Meta] Logstash to Logstash [Bootstrap]

Open yaauie opened this issue 1 year ago • 2 comments

User Needs:

  • As a user with distributed Logstash needs, I want a supported, straight-forward, and stable way to send events from one Logstash pipeline to another Logstash pipeline that exists in either a separate process on the same host or in a different data-center, optionally secured with TLS/SSL.
  • As a maintainer of Logstash-to-Logstash pipelines, I do not want to be concerned with how the events are emitted, transmitted, or encoded, so long as the upstream Logstash has a TCP/HTTP route to the downstream Logstash.

See also: https://github.com/elastic/logstash/issues/14477

Task: API Boundary Bootstrap

Introduce a Logstash "integration" plugin that contains an output plugin and an input plugin whose minimal configuration allow transmission of events from one Logstash pipeline to another using a host, port, and optional standard SSL settings.

For example, transmitting events over plaintext would be as simple as defining an output in the upstream pipeline targeting a host and port:

output {
  logstash {
    host => "ls.mycorpnet.internal"
    port => "9820"
  }
}

-- upstream pipeline on ls.mycorpnet.external

And an accompanying input in the downstream pipeline that binds to the same port:

input {
  logstash {
    port => "9820"
  }
}

-- downstream pipeline on ls.mycorpnet.internal

Details: Security

SSL/TLS communications must be configured using standardized SSL settings.

Details: Open-to-extend

The configuration of these plugins are intended to be an API boundary so that:

  • pipeline authors can use them without needing to consider the implementation details of protocol and/or encoding
  • plugin/logstash maintainers can evolve the implementation details without impacting in-the-wild configurations, including adding protocols and/or encoding mechanisms, along with negotiation to ensure the "best available" protocol and/or encoding is used when the sending and receiving Logstash pipelines have differing versions of the plugin.

In the initial release of this plugin set, batches of events should be transmitted using newline-delimited JSON over HTTP(s) using POST /events requests, but these are internal implementation details. These details are not exposed to the user via plugin configuration options.

The behavior of the initial release of a logstash input plugin is explicitly undefined when receiving bare TCP or TLS TCP connections, when receiving HTTP(s) requests with verbs other than POST or to paths other than /events, or with content-types other than application/x-ndjson.

Bootstrap Requirements

  • an output supports a single host/port pair
  • http(s) with standard-form SSL identity and trust configuration (with workable subset)
  • events sent via http(s) using POST /events and a Content-Type: application/x-ndjson header
  • events received have zero implicit enrichment from the input plugin

Initial Extensions

  • Startup Safety: an output that cannot connect to a downstream input should prevent the pipeline from starting (possibly after some reasonable delay)
  • Shutdown Safety: an output that is being shut-down and is unable to deliver an in-flight batch in a timely manner should NACK the batch so that pipelines with PQ can re-attempt.
  • Load-balancing: an output should be configurable with hosts => ["12.34.56.78:1234", "12.34.56.79:1234"] or hosts => ["12.34.56.78", "12.34.56.79"] port => 1234 (alternatives to host => "12.34.56.78" port => 1234) and should distribute events to the downstream hosts (naive whole-batch round-robin is ok at first, but then we could make it better)
  • SSL: fill gaps in identity and trust configuration.
  • Metadata: add support for propagating @metadata

Implementation Plan

### Bootstrap
- [x] https://github.com/logstash-plugins/logstash-integration-logstash/pull/2
- [ ] https://github.com/logstash-plugins/logstash-integration-logstash/issues/6
- [ ] https://github.com/logstash-plugins/logstash-integration-logstash/issues/4
- [ ] https://github.com/logstash-plugins/logstash-integration-logstash/issues/12
- [ ] https://github.com/logstash-plugins/logstash-integration-logstash/issues/1
- [ ] https://github.com/elastic/logstash/issues/15170
- [x] Release the `logstash-integration-logstash` plugin
- [ ] https://github.com/elastic/logstash/issues/15335
### Advanced Features
- [ ] https://github.com/logstash-plugins/logstash-integration-logstash/issues/11
- [ ] https://github.com/logstash-plugins/logstash-integration-logstash/issues/5
- [ ] https://github.com/elastic/logstash/issues/15498
### Testing
- [ ] https://github.com/elastic/logstash/issues/15496
- [ ] https://github.com/elastic/ingest-dev/issues/2574

yaauie avatar Jul 10 '23 22:07 yaauie

Related docs issues:

  • elastic/logstash#15169
  • elastic/logstash#15213

karenzone avatar Aug 11 '23 15:08 karenzone

Time to live TTL may be a useful output setting if logstash(s) is behind a load balancer, which terminates the connection every X minutes. When this happens it can cause duplicates in elastic when Logstash re-tries the terminated request.

mbudge avatar Nov 15 '23 01:11 mbudge

Closing this issue as the work is now complete. If you would like to file enhancement request on the matter, please open new issues.

roaksoax avatar May 05 '24 17:05 roaksoax