logstash icon indicating copy to clipboard operation
logstash copied to clipboard

Logstash gem environment improvements

Open donoghuc opened this issue 6 months ago • 5 comments

This issue captures improvements for managing the vendored ruby/gem environment shipped with logstash artifacts. The scope is to optimize artifact generation with a clear strategy for gem inclusion (with emphasis around a policy for handling default gems) and consistent testing.

Gem duplication in artifacts: Jruby ships with a set of “default” gems similar to MRI ruby. In our build process newer gems are shipped that shadow the default ones. Occasionally those defaults (for a jruby targeting an “older” MRI ruby in our case 3.1) will have a CVE. When a CVE is reported (by Snyk) we will update our Gemfile to ship a newer version. This results in duplicate versions (the default shipped with jruby and the updated version managed with bundler). Gem duplication around default gems has caused some paper cuts (ambiguous gem spec warnings, jar-dependencies loading errors, cve scanner issues). See https://github.com/elastic/logstash/issues/17732 for discussion.

I would like to explore the following:

  • How to reduce/eliminate duplication: (Figure out how to make bundler use default gems if possible, explicitly pin all default gems in our Gemfile, others?)
  • Build some automation around understanding default gems shipped with jruby vs our vendored gems
  • How do we guarantee we dont regress when we have a security exception for a default gem (our solution for example when there is a cve in a default gem is to ship a newer one. The default still exists in artifacts and is ignored, seemingly even if a newer version is removed).

Gem env for testing: The set of gems our tests run against should be consistent with those we ship in vendored artifacts. We recently missed an issue with jar-dependencies activation due to this discrepency. To illustrate this, compare the Gemfile.locks produced from the different gradle tasks prepared with:

 git checkout upstream/main
 ./gradlew clean
 ./gradlew rubyTests
 cp Gemfile.lock Gemfile.lock.rubyTests
 ./gradlew clean bootstrap assemble installDefaultGems
 cp Gemfile.lock Gemfile.lock.installDefaultGems
 diff Gemfile.lock.rubyTests Gemfile.lock.installDefaultGems

See below for the diff, but the TLDR is that we run the ruby tests against a MUCH smaller set of gems that do not actually represent the entire set shipped with LS packages.

Example gem env diff
➜  logstash git:(ddd519cc89) ✗ diff Gemfile.lock.rubyTests Gemfile.lock.installDefaultGems
39a40,78
>     atomic (1.1.101-java)
>     avl_tree (1.2.1)
>       atomic (~> 1.1)
>     avro (1.10.2)
>       multi_json (~> 1)
>     aws-eventstream (1.4.0)
>     aws-partitions (1.1126.0)
>     aws-sdk-cloudfront (1.119.0)
>       aws-sdk-core (~> 3, >= 3.225.0)
>       aws-sigv4 (~> 1.5)
>     aws-sdk-cloudwatch (1.116.0)
>       aws-sdk-core (~> 3, >= 3.225.0)
>       aws-sigv4 (~> 1.5)
>     aws-sdk-core (3.226.2)
>       aws-eventstream (~> 1, >= 1.3.0)
>       aws-partitions (~> 1, >= 1.992.0)
>       aws-sigv4 (~> 1.9)
>       base64
>       jmespath (~> 1, >= 1.6.1)
>       logger
>     aws-sdk-kms (1.106.0)
>       aws-sdk-core (~> 3, >= 3.225.0)
>       aws-sigv4 (~> 1.5)
>     aws-sdk-resourcegroups (1.83.0)
>       aws-sdk-core (~> 3, >= 3.225.0)
>       aws-sigv4 (~> 1.5)
>     aws-sdk-s3 (1.192.0)
>       aws-sdk-core (~> 3, >= 3.225.0)
>       aws-sdk-kms (~> 1)
>       aws-sigv4 (~> 1.5)
>     aws-sdk-sns (1.100.0)
>       aws-sdk-core (~> 3, >= 3.225.0)
>       aws-sigv4 (~> 1.5)
>     aws-sdk-sqs (1.96.0)
>       aws-sdk-core (~> 3, >= 3.225.0)
>       aws-sigv4 (~> 1.5)
>     aws-sigv4 (1.12.1)
>       aws-eventstream (~> 1, >= 1.0.2)
>     back_pressure (1.0.0)
45a85,86
>     bindata (2.5.1)
>     buftok (0.2.0)
60a102
>     dalli (3.2.8)
63a106
>     domain_name (0.6.20240107)
67a111
>     edn (1.1.1)
75a120,122
>     equalizer (0.0.11)
>     et-orbi (1.2.11)
>       tzinfo
94a142,145
>     fugit (1.11.1)
>       et-orbi (~> 1, >= 1.2.11)
>       raabro (~> 1.4)
>     gelfd2 (0.4.1)
96a148,149
>     gene_pool (1.5.0)
>       concurrent-ruby (>= 1.0)
97a151,160
>     hitimes (1.3.1-java)
>     http (3.3.0)
>       addressable (~> 2.3)
>       http-cookie (~> 1.0)
>       http-form_data (~> 2.0)
>       http_parser.rb (~> 0.6.0)
>     http-cookie (1.0.8)
>       domain_name (~> 0.5)
>     http-form_data (2.3.0)
>     http_parser.rb (0.6.0-java)
100a164
>     jar-dependencies (0.5.5)
102a167,169
>     jls-lumberjack (0.0.26)
>       concurrent-ruby
>     jmespath (1.6.2)
103a171,173
>     jruby-jms (1.3.0-java)
>       gene_pool
>       semantic_logger
113a184,225
>     logstash-codec-avro (3.4.1-java)
>       avro (~> 1.10.2)
>       logstash-core-plugin-api (>= 1.60, <= 2.99)
>       logstash-mixin-ecs_compatibility_support (~> 1.3)
>       logstash-mixin-event_support (~> 1.0)
>       logstash-mixin-validator_support (~> 1.0)
>     logstash-codec-cef (6.2.8-java)
>       logstash-core-plugin-api (>= 1.60, <= 2.99)
>       logstash-mixin-ecs_compatibility_support (~> 1.3)
>       logstash-mixin-event_support (~> 1.0)
>     logstash-codec-collectd (3.1.0)
>       logstash-core-plugin-api (>= 1.60, <= 2.99)
>       logstash-mixin-event_support (~> 1.0)
>       logstash-mixin-validator_support (~> 1.0)
>     logstash-codec-dots (3.0.6)
>       logstash-core-plugin-api (>= 1.60, <= 2.99)
>     logstash-codec-edn (3.1.0)
>       edn
>       logstash-core-plugin-api (>= 1.60, <= 2.99)
>       logstash-mixin-event_support (~> 1.0)
>       logstash-mixin-validator_support (~> 1.0)
>     logstash-codec-edn_lines (3.1.0)
>       edn
>       logstash-codec-line
>       logstash-core-plugin-api (>= 1.60, <= 2.99)
>       logstash-mixin-event_support (~> 1.0)
>       logstash-mixin-validator_support (~> 1.0)
>     logstash-codec-es_bulk (3.1.0)
>       logstash-codec-line
>       logstash-core-plugin-api (>= 1.60, <= 2.99)
>       logstash-mixin-ecs_compatibility_support (~> 1.3)
>       logstash-mixin-event_support (~> 1.0)
>       logstash-mixin-validator_support (~> 1.0)
>     logstash-codec-fluent (3.4.3-java)
>       logstash-core-plugin-api (>= 1.60, <= 2.99)
>       logstash-mixin-event_support (~> 1.0)
>       logstash-mixin-validator_support (~> 1.0)
>       msgpack (~> 1.1)
>     logstash-codec-graphite (3.0.6)
>       logstash-codec-line
>       logstash-core-plugin-api (>= 1.60, <= 2.99)
>       logstash-mixin-event_support (~> 1.0)
127a240,244
>     logstash-codec-msgpack (3.1.0-java)
>       logstash-core-plugin-api (>= 1.60, <= 2.99)
>       logstash-mixin-event_support (~> 1.0)
>       logstash-mixin-validator_support (~> 1.0)
>       msgpack (~> 1.1)
134a252,255
>     logstash-codec-netflow (4.3.2)
>       bindata (>= 1.5.0)
>       logstash-core-plugin-api (~> 2.0)
>       logstash-mixin-event_support (~> 1.0)
152a274,280
>     logstash-filter-aggregate (2.10.0)
>       logstash-core-plugin-api (>= 1.60, <= 2.99)
>     logstash-filter-anonymize (3.0.7)
>       logstash-core-plugin-api (>= 1.60, <= 2.99)
>       murmurhash3 (= 0.1.6)
>     logstash-filter-cidr (3.1.3-java)
>       logstash-core-plugin-api (>= 1.60, <= 2.99)
155a284,287
>     logstash-filter-csv (3.1.1)
>       logstash-core-plugin-api (>= 1.60, <= 2.99)
>       logstash-mixin-ecs_compatibility_support (~> 1.3)
>       logstash-mixin-validator_support (~> 1.0)
157a290,297
>     logstash-filter-de_dot (1.1.0)
>       logstash-core-plugin-api (>= 1.60, <= 2.99)
>     logstash-filter-dissect (1.2.5)
>       jar-dependencies
>       logstash-core-plugin-api (>= 2.1.1, <= 2.99)
>     logstash-filter-dns (3.2.0)
>       logstash-core-plugin-api (>= 1.60, <= 2.99)
>       lru_redux (~> 1.1.0)
158a299,304
>       logstash-core-plugin-api (>= 1.60, <= 2.99)
>     logstash-filter-elastic_integration (9.1.0-java)
>       logstash-core (>= 8.7.0)
>       logstash-core-plugin-api (>= 1.60, <= 2.99)
>     logstash-filter-elasticsearch (4.2.0)
>       elasticsearch (>= 7.14.9, < 9)
159a306,313
>       logstash-mixin-ca_trusted_fingerprint_support (~> 1.0)
>       logstash-mixin-ecs_compatibility_support (~> 1.3)
>       logstash-mixin-validator_support (~> 1.0)
>       manticore (>= 0.7.1)
>     logstash-filter-fingerprint (3.4.4)
>       logstash-core-plugin-api (>= 1.60, <= 2.99)
>       logstash-mixin-ecs_compatibility_support (~> 1.2)
>       murmurhash3 (= 0.1.6)
163a318,329
>     logstash-filter-grok (4.4.3)
>       jls-grok (~> 0.11.3)
>       logstash-core (>= 5.6.0)
>       logstash-core-plugin-api (>= 1.60, <= 2.99)
>       logstash-mixin-ecs_compatibility_support (~> 1.0)
>       logstash-patterns-core (>= 4.3.0, < 5)
>       stud (~> 0.0.22)
>     logstash-filter-http (2.0.0)
>       logstash-core-plugin-api (>= 1.60, <= 2.99)
>       logstash-mixin-ecs_compatibility_support (~> 1.2)
>       logstash-mixin-http_client (>= 7.5.0, < 8.0.0)
>       logstash-mixin-validator_support (~> 1.0)
167a334,344
>     logstash-filter-kv (4.7.0)
>       logstash-core-plugin-api (>= 1.60, <= 2.99)
>       logstash-mixin-ecs_compatibility_support (~> 1.3)
>       logstash-mixin-validator_support (~> 1.0)
>     logstash-filter-memcached (1.2.0)
>       dalli (~> 3)
>       logstash-core-plugin-api (~> 2.0)
>     logstash-filter-metrics (4.0.7)
>       logstash-core-plugin-api (>= 1.60, <= 2.99)
>       metriks
>       thread_safe
169a347,348
>     logstash-filter-prune (3.0.4)
>       logstash-core-plugin-api (>= 1.60, <= 2.99)
171a351,377
>     logstash-filter-sleep (3.0.7)
>       logstash-core-plugin-api (>= 1.60, <= 2.99)
>     logstash-filter-split (3.1.8)
>       logstash-core-plugin-api (>= 1.60, <= 2.99)
>     logstash-filter-syslog_pri (3.2.1)
>       logstash-core-plugin-api (>= 1.60, <= 2.99)
>       logstash-mixin-ecs_compatibility_support (~> 1.3)
>     logstash-filter-throttle (4.0.4)
>       atomic
>       logstash-core-plugin-api (>= 1.60, <= 2.99)
>       thread_safe
>     logstash-filter-translate (3.4.2)
>       logstash-core-plugin-api (>= 1.60, <= 2.99)
>       logstash-mixin-deprecation_logger_support (~> 1.0)
>       logstash-mixin-ecs_compatibility_support (~> 1.2)
>       logstash-mixin-scheduler (~> 1.0)
>       logstash-mixin-validator_support (~> 1.0)
>       psych (>= 5.1.0)
>     logstash-filter-truncate (1.0.6)
>       logstash-core-plugin-api (>= 1.60, <= 2.99)
>     logstash-filter-urldecode (3.0.6)
>       logstash-core-plugin-api (>= 1.60, <= 2.99)
>     logstash-filter-useragent (3.3.5-java)
>       logstash-core-plugin-api (>= 1.60, <= 2.99)
>       logstash-mixin-ecs_compatibility_support (~> 1.3)
>     logstash-filter-uuid (3.0.5)
>       logstash-core-plugin-api (>= 1.60, <= 2.99)
176a383,387
>     logstash-input-azure_event_hubs (1.5.1)
>       logstash-codec-json
>       logstash-codec-plain
>       logstash-core-plugin-api (~> 2.0)
>       stud (>= 0.0.22)
185a397,446
>     logstash-input-couchdb_changes (3.1.6)
>       json
>       logstash-codec-plain
>       logstash-core-plugin-api (>= 1.60, <= 2.99)
>       stud (>= 0.0.22)
>     logstash-input-dead_letter_queue (2.0.1)
>       logstash-codec-plain
>       logstash-core (>= 8.4.0)
>       logstash-core-plugin-api (>= 1.60, <= 2.99)
>     logstash-input-elastic_serverless_forwarder (2.0.0-java)
>       logstash-codec-json_lines
>       logstash-core-plugin-api (>= 1.60, <= 2.99)
>       logstash-input-http (>= 3.7.2)
>       logstash-mixin-ecs_compatibility_support (~> 1.2)
>       logstash-mixin-normalize_config_support (~> 1.0)
>       logstash-mixin-plugin_factory_support
>     logstash-input-elasticsearch (5.2.0)
>       elasticsearch (>= 7.17.9, < 9)
>       logstash-core-plugin-api (>= 1.60, <= 2.99)
>       logstash-mixin-ca_trusted_fingerprint_support (~> 1.0)
>       logstash-mixin-ecs_compatibility_support (~> 1.3)
>       logstash-mixin-event_support (~> 1.0)
>       logstash-mixin-normalize_config_support (~> 1.0)
>       logstash-mixin-scheduler (~> 1.0)
>       logstash-mixin-validator_support (~> 1.0)
>       manticore (>= 0.7.1)
>       tzinfo
>       tzinfo-data
>     logstash-input-exec (3.6.0)
>       logstash-codec-plain
>       logstash-core-plugin-api (>= 1.60, <= 2.99)
>       logstash-mixin-ecs_compatibility_support (~> 1.3)
>       logstash-mixin-scheduler (~> 1.0)
>       stud (~> 0.0.22)
>     logstash-input-file (4.4.6)
>       addressable
>       concurrent-ruby (~> 1.0)
>       logstash-codec-multiline (~> 3.0)
>       logstash-codec-plain
>       logstash-core-plugin-api (>= 1.60, <= 2.99)
>       logstash-mixin-ecs_compatibility_support (~> 1.3)
>     logstash-input-ganglia (3.1.4)
>       logstash-codec-plain
>       logstash-core-plugin-api (>= 1.60, <= 2.99)
>       stud (~> 0.0.22)
>     logstash-input-gelf (3.3.2)
>       gelfd2 (= 0.4.1)
>       logstash-codec-plain
>       logstash-core-plugin-api (>= 1.60, <= 2.99)
>       stud (>= 0.0.22, < 0.1.0)
186a448,483
>       logstash-codec-plain
>       logstash-core-plugin-api (>= 1.60, <= 2.99)
>       logstash-mixin-ecs_compatibility_support (~> 1.3)
>     logstash-input-graphite (3.0.6)
>       logstash-core-plugin-api (>= 1.60, <= 2.99)
>       logstash-input-tcp
>     logstash-input-heartbeat (3.1.1)
>       logstash-codec-plain
>       logstash-core-plugin-api (>= 1.60, <= 2.99)
>       logstash-mixin-deprecation_logger_support (~> 1.0)
>       logstash-mixin-ecs_compatibility_support (~> 1.2)
>       logstash-mixin-event_support (~> 1.0)
>       stud
>     logstash-input-http (4.1.2-java)
>       logstash-codec-plain
>       logstash-core-plugin-api (>= 1.60, <= 2.99)
>       logstash-mixin-ecs_compatibility_support (~> 1.2)
>       logstash-mixin-normalize_config_support (~> 1.0)
>     logstash-input-http_poller (6.0.0)
>       logstash-codec-plain
>       logstash-core-plugin-api (>= 1.60, <= 2.99)
>       logstash-mixin-ecs_compatibility_support (~> 1.3)
>       logstash-mixin-event_support (~> 1.0, >= 1.0.1)
>       logstash-mixin-http_client (>= 7.5.0, < 8.0.0)
>       logstash-mixin-scheduler (~> 1.0)
>       logstash-mixin-validator_support (~> 1.0)
>     logstash-input-jms (3.3.0-java)
>       jruby-jms (>= 1.2.0)
>       logstash-codec-json (~> 3.0)
>       logstash-codec-plain (~> 3.0)
>       logstash-core-plugin-api (>= 1.60, <= 2.99)
>       logstash-mixin-ecs_compatibility_support (~> 1.3)
>       logstash-mixin-event_support (~> 1.0)
>       logstash-mixin-validator_support (~> 1.0)
>       semantic_logger (< 4.0.0)
>     logstash-input-pipe (3.1.0)
189a487,491
>       stud (~> 0.0.22)
>     logstash-input-redis (3.7.1)
>       logstash-codec-json
>       logstash-core-plugin-api (>= 1.60, <= 2.99)
>       redis (>= 4.0.1, < 5)
194a497,504
>     logstash-input-syslog (3.7.1)
>       concurrent-ruby
>       logstash-codec-plain
>       logstash-core-plugin-api (>= 1.60, <= 2.99)
>       logstash-filter-date
>       logstash-filter-grok (>= 4.4.1)
>       logstash-mixin-ecs_compatibility_support (~> 1.2)
>       stud (>= 0.0.22, < 0.1.0)
204a515,590
>     logstash-input-twitter (4.1.1)
>       http-form_data (~> 2)
>       logstash-core-plugin-api (>= 1.60, <= 2.99)
>       logstash-mixin-ecs_compatibility_support (~> 1.3)
>       logstash-mixin-event_support (~> 1.0)
>       logstash-mixin-validator_support (~> 1.0)
>       public_suffix (> 4, < 6)
>       stud (>= 0.0.22, < 0.1)
>       twitter (= 6.2.0)
>     logstash-input-udp (3.5.0)
>       logstash-codec-plain
>       logstash-core-plugin-api (>= 1.60, <= 2.99)
>       logstash-mixin-ecs_compatibility_support (~> 1.2)
>       stud (~> 0.0.22)
>     logstash-input-unix (3.1.2)
>       logstash-codec-line
>       logstash-core-plugin-api (>= 1.60, <= 2.99)
>       logstash-mixin-ecs_compatibility_support (~> 1.3)
>     logstash-integration-aws (7.2.1-java)
>       aws-sdk-cloudfront
>       aws-sdk-cloudwatch
>       aws-sdk-core (~> 3)
>       aws-sdk-resourcegroups
>       aws-sdk-s3
>       aws-sdk-sns
>       aws-sdk-sqs
>       concurrent-ruby
>       logstash-codec-json
>       logstash-codec-plain
>       logstash-core-plugin-api (>= 2.1.12, <= 2.99)
>       rexml
>       rufus-scheduler (>= 3.0.9)
>       stud (~> 0.0.22)
>     logstash-integration-jdbc (5.6.0)
>       logstash-codec-plain
>       logstash-core (>= 6.5.0)
>       logstash-core-plugin-api (>= 1.60, <= 2.99)
>       logstash-mixin-ecs_compatibility_support (~> 1.3)
>       logstash-mixin-event_support (~> 1.0)
>       logstash-mixin-scheduler (~> 1.0)
>       logstash-mixin-validator_support (~> 1.0)
>       lru_redux
>       sequel (>= 5.74.0)
>       tzinfo
>       tzinfo-data
>     logstash-integration-kafka (11.6.3-java)
>       logstash-codec-json
>       logstash-codec-plain
>       logstash-core (>= 8.3.0)
>       logstash-core-plugin-api (>= 1.60, <= 2.99)
>       logstash-mixin-deprecation_logger_support (~> 1.0)
>       manticore (>= 0.5.4, < 1.0.0)
>       stud (>= 0.0.22, < 0.1.0)
>     logstash-integration-logstash (1.0.4-java)
>       logstash-codec-json_lines (~> 3.1)
>       logstash-core-plugin-api (>= 2.1.12, <= 2.99)
>       logstash-input-http (>= 3.7.0)
>       logstash-mixin-http_client (~> 7.3)
>       logstash-mixin-plugin_factory_support (~> 1.0)
>       logstash-mixin-validator_support (~> 1.1)
>       stud
>     logstash-integration-rabbitmq (7.4.0-java)
>       back_pressure (~> 1.0)
>       logstash-codec-json
>       logstash-core (>= 6.5.0)
>       logstash-core-plugin-api (>= 1.60, <= 2.99)
>       march_hare (~> 4.0)
>       stud (~> 0.0.22)
>     logstash-integration-snmp (4.0.6-java)
>       logstash-codec-plain
>       logstash-core (>= 6.5.0)
>       logstash-core-plugin-api (>= 1.60, <= 2.99)
>       logstash-mixin-ecs_compatibility_support (~> 1.3)
>       logstash-mixin-event_support (~> 1.0)
>       logstash-mixin-normalize_config_support (~> 1.0)
>       logstash-mixin-validator_support (~> 1.0)
212a599,603
>     logstash-mixin-http_client (7.5.0)
>       logstash-codec-plain
>       logstash-core-plugin-api (>= 1.60, <= 2.99)
>       logstash-mixin-normalize_config_support (~> 1.0)
>       manticore (>= 0.8.0, < 1.0.0)
216a608,610
>     logstash-mixin-scheduler (1.0.1-java)
>       logstash-core (>= 7.16)
>       rufus-scheduler (>= 3.0.9)
218a613,615
>     logstash-output-csv (3.0.10)
>       logstash-core-plugin-api (>= 1.60, <= 2.99)
>       logstash-output-file
226a624,627
>     logstash-output-email (4.1.3)
>       logstash-core-plugin-api (>= 1.60, <= 2.99)
>       mail (~> 2.8)
>       mustache (>= 0.99.8)
230a632,643
>     logstash-output-graphite (3.1.6)
>       logstash-core-plugin-api (>= 1.60, <= 2.99)
>     logstash-output-http (6.0.0)
>       logstash-core-plugin-api (>= 1.60, <= 2.99)
>       logstash-mixin-http_client (>= 7.5.0, < 8.0.0)
>     logstash-output-lumberjack (3.1.9)
>       jls-lumberjack (>= 0.0.26)
>       logstash-core-plugin-api (>= 1.60, <= 2.99)
>       stud
>     logstash-output-nagios (3.0.6)
>       logstash-codec-plain
>       logstash-core-plugin-api (>= 1.60, <= 2.99)
233a647,654
>     logstash-output-pipe (3.0.6)
>       logstash-codec-plain
>       logstash-core-plugin-api (>= 1.60, <= 2.99)
>     logstash-output-redis (5.2.0)
>       logstash-core (>= 6.0)
>       logstash-core-plugin-api (>= 1.60, <= 2.99)
>       redis (~> 4)
>       stud
236a658,669
>     logstash-output-tcp (7.0.1)
>       jruby-openssl (>= 0.12.2)
>       logstash-codec-json
>       logstash-core (>= 8.1.0)
>       logstash-core-plugin-api (>= 1.60, <= 2.99)
>       stud
>     logstash-output-udp (3.2.0)
>       logstash-codec-json
>       logstash-core-plugin-api (>= 1.60, <= 2.99)
>     logstash-output-webhdfs (3.1.0-java)
>       logstash-core-plugin-api (>= 1.60, <= 2.99)
>       webhdfs
238a672,677
>     lru_redux (1.1.0)
>     mail (2.8.1)
>       mini_mime (>= 0.1.1)
>       net-imap
>       net-pop
>       net-smtp
240a680,682
>     march_hare (4.7.0-java)
>     memoizable (0.4.2)
>       thread_safe (~> 0.3, >= 0.3.1)
241a684,688
>     metriks (0.9.9.8)
>       atomic (~> 1.0)
>       avl_tree (~> 1.2.0)
>       hitimes (~> 1.1)
>     mini_mime (1.1.5)
242a690
>     msgpack (1.8.0-java)
243a692
>     multipart-post (2.4.1)
247a697
>     naught (1.1.0)
249a700,708
>     net-imap (0.5.9)
>       date
>       net-protocol
>     net-pop (0.1.2)
>       net-protocol
>     net-protocol (0.2.2)
>       timeout
>     net-smtp (0.5.1)
>       net-protocol
274c733,736
<     public_suffix (6.0.2)
---
>     psych (5.2.6-java)
>       date
>       jar-dependencies (>= 0.1.7)
>     public_suffix (5.1.1)
276a739
>     raabro (1.4.0)
289a753
>     redis (4.8.1)
296,297d759
<     rspec-collection_matchers (1.2.1)
<       rspec-expectations (>= 2.99.0.beta1)
324a787,788
>     rufus-scheduler (3.9.2)
>       fugit (~> 1.1, >= 1.11.1)
327a792,796
>     semantic_logger (3.4.1)
>       concurrent-ruby (~> 1.0)
>     sequel (5.94.0)
>       bigdecimal
>     simple_oauth (0.3.1)
350a820
>     timeout (0.4.3)
352a823,833
>     twitter (6.2.0)
>       addressable (~> 2.3)
>       buftok (~> 0.2.0)
>       equalizer (~> 0.0.11)
>       http (~> 3.0)
>       http-form_data (~> 2.0)
>       http_parser.rb (~> 0.6.0)
>       memoizable (~> 0.4.0)
>       multipart-post (~> 2.0)
>       naught (~> 1.0)
>       simple_oauth (~> 0.3.0)
360a842,843
>     webhdfs (0.11.0)
>       addressable
365d847
<     webrick (1.9.1)
376d857
<   cabin (~> 0.6)
380d860
<   elasticsearch
382c862
<   flores
---
>   flores (~> 0.0.8)
385a866,880
>   logstash-codec-avro
>   logstash-codec-cef
>   logstash-codec-collectd
>   logstash-codec-dots
>   logstash-codec-edn
>   logstash-codec-edn_lines
>   logstash-codec-es_bulk
>   logstash-codec-fluent
>   logstash-codec-graphite
>   logstash-codec-json
>   logstash-codec-json_lines
>   logstash-codec-line
>   logstash-codec-msgpack
>   logstash-codec-multiline
>   logstash-codec-netflow
386a882
>   logstash-codec-rubydebug
389c885,888
<   logstash-devutils
---
>   logstash-devutils (~> 2.6.0)
>   logstash-filter-aggregate
>   logstash-filter-anonymize
>   logstash-filter-cidr
390a890
>   logstash-filter-csv
391a892,894
>   logstash-filter-de_dot
>   logstash-filter-dissect
>   logstash-filter-dns
392a896,898
>   logstash-filter-elastic_integration
>   logstash-filter-elasticsearch
>   logstash-filter-fingerprint
393a900,901
>   logstash-filter-grok
>   logstash-filter-http
394a903,905
>   logstash-filter-kv
>   logstash-filter-memcached
>   logstash-filter-metrics
395a907
>   logstash-filter-prune
396a909,917
>   logstash-filter-sleep
>   logstash-filter-split
>   logstash-filter-syslog_pri
>   logstash-filter-throttle
>   logstash-filter-translate
>   logstash-filter-truncate
>   logstash-filter-urldecode
>   logstash-filter-useragent
>   logstash-filter-uuid
397a919
>   logstash-input-azure_event_hubs
398a921,928
>   logstash-input-couchdb_changes
>   logstash-input-dead_letter_queue
>   logstash-input-elastic_serverless_forwarder
>   logstash-input-elasticsearch
>   logstash-input-exec
>   logstash-input-file
>   logstash-input-ganglia
>   logstash-input-gelf
399a930,936
>   logstash-input-graphite
>   logstash-input-heartbeat
>   logstash-input-http
>   logstash-input-http_poller
>   logstash-input-jms
>   logstash-input-pipe
>   logstash-input-redis
400a938
>   logstash-input-syslog
401a940,949
>   logstash-input-twitter
>   logstash-input-udp
>   logstash-input-unix
>   logstash-integration-aws
>   logstash-integration-jdbc
>   logstash-integration-kafka
>   logstash-integration-logstash
>   logstash-integration-rabbitmq
>   logstash-integration-snmp
>   logstash-output-csv
402a951
>   logstash-output-email
403a953,956
>   logstash-output-graphite
>   logstash-output-http
>   logstash-output-lumberjack
>   logstash-output-nagios
404a958,959
>   logstash-output-pipe
>   logstash-output-redis
405a961,963
>   logstash-output-tcp
>   logstash-output-udp
>   logstash-output-webhdfs
415d972
<   rspec-collection_matchers
425,426c982
<   webmock
<   webrick
---
>   webmock (~> 3)

donoghuc avatar Jul 30 '25 18:07 donoghuc

Gem duplication findings:

Regarding gem duplication from artifacts I can see that removing all gems from ./vendor/jruby/lib/ruby/gems/shared/gems and gemspecs from ./vendor/jruby/lib/ruby/gems/shared/specifications/, but keeping only bundler and its gemspec doesn't hinder Logstash from working at all, gems will be read from the bundler directory.

We can also remove ./vendor/jruby/lib/ruby/stdlib/jopenssl.jar and the net-* gemspecs from vendor/jruby/lib/ruby/stdlib/net/*.gemspec and these will still be loaded from bundled paths.

This would remove a lot of duplication. What I don't know is if we can do it even before packaging, and how this influences testing. But definitely we can do it for the final artifacts, from what I tested.

jsvd avatar Jul 31 '25 10:07 jsvd

Gem testing env improvment https://github.com/elastic/logstash/pull/18330

donoghuc avatar Oct 21 '25 20:10 donoghuc

Thanks @jsvd I wanted to start the investigation in to teaching bundler how to not duplicate in the first place (we will likely still want to explore deduplicating gems we intentionally need newer versions of). Starting with getting the default gem location on the GEM_PATH for bundler:

diff --git a/lib/bootstrap/bundler.rb b/lib/bootstrap/bundler.rb
index f1c93cff45..05c508f8ac 100644
--- a/lib/bootstrap/bundler.rb
+++ b/lib/bootstrap/bundler.rb
@@ -84,7 +84,8 @@ module LogStash
       options[:without] = Array(options[:without])

       ::Gem.clear_paths
-      ENV['GEM_HOME'] = ENV['GEM_PATH'] = Environment.logstash_gem_home
+      ENV['GEM_HOME'] = Environment.logstash_gem_home
+      ENV['GEM_PATH'] = [Environment.logstash_gem_home, ::Gem.default_path].flatten.join(File::PATH_SEPARATOR)
       ::Gem.paths = ENV

       # set BUNDLE_GEMFILE ENV before requiring bundler to avoid bundler recurse and load unrelated Gemfile(s)
@@ -128,7 +129,8 @@ module LogStash
       options[:without] = Array(options[:without])
       options[:update] = Array(options[:update]) if options[:update]
       ::Gem.clear_paths
-      ENV['GEM_HOME'] = ENV['GEM_PATH'] = LogStash::Environment.logstash_gem_home
+      ENV['GEM_HOME'] = Environment.logstash_gem_home
+      ENV['GEM_PATH'] = [Environment.logstash_gem_home, ::Gem.default_path].flatten.join(File::PATH_SEPARATOR)
       ::Gem.paths = ENV
       # set BUNDLE_GEMFILE ENV before requiring bundler to avoid bundler recurse and load unrelated Gemfile(s).
       # in the context of calling Bundler::CLI this is not really required since Bundler::CLI will look at
@@ -170,6 +172,10 @@ module LogStash
                 "BUNDLE_GEMFILE" => LogStash::Environment::GEMFILE_PATH,
                 "BUNDLE_SILENCE_ROOT_WARNING" => "true",
                 "BUNDLE_WITHOUT" => options[:without].join(":")}) do
+        # Add this temporarily to see what's happening
+        puts "BUNDLE_DISABLE_SHARED_GEMS: #{ENV['BUNDLE_DISABLE_SHARED_GEMS'].inspect}"
+        puts "Bundler disable_shared_gems setting: #{::Bundler.settings[:disable_shared_gems].inspect}"
+        puts "Default path: #{ ::Gem.default_path.join(File::PATH_SEPARATOR)}"
         if !debug?
           # Will deal with transient network errors
           execute_bundler_with_retry(options)

I could see a world where we hopefully will make bundler consider default gems, if we need a newer version we explicitly pin it in the gemspec/Gemfile then delete those duplicated ones after.

Will continue looking!

donoghuc avatar Oct 21 '25 23:10 donoghuc

It looks like even when we configure bundler to see default/bundled gems it will still duplicate. I think the best path forward at this point is to focus on deduplicating in the build script. We have precedence for this, i think we can expand on what we have now to programatically do the deduplication. https://github.com/elastic/logstash/blob/b6af3151fafa427482cf29702c6e99fcd4fa9d41/rakelib/artifacts.rake#L85-L117

donoghuc avatar Oct 22 '25 23:10 donoghuc

WIP pr for detecting and excluding dupes: https://github.com/elastic/logstash/pull/18340

donoghuc avatar Oct 23 '25 00:10 donoghuc

The first iteration of this has been merged in the main branch targeting the 9.3 release https://github.com/elastic/logstash/pull/18340

WIth this step we have a cleanup task that removes duplicate gemspecs. In the future we will pick that optimization up (tracked with https://github.com/elastic/logstash/issues/18485 )

donoghuc avatar Dec 10 '25 18:12 donoghuc