beats icon indicating copy to clipboard operation
beats copied to clipboard

Fix system module with both filesets enabled

Open belimawr opened this issue 1 year ago • 1 comments

Proposed commit message

The system module did not define an ID at the root of the config, that made the V2 input loader only start the first journald input it saw because they both ended up with the same identifier (type, ID and path). This is fixed by defining an ID at the root of the configuration templates.

The journald input now also adds the input_id key to its loggers and a non-fatal error is now logged at debug level.

The system-logs input is now marked as experimental instead of stable.

Fix lint warnings by moving toJournalConfig to input_linux.go

Checklist

  • [x] My code follows the style guidelines of this project
  • [x] I have commented my code, particularly in hard-to-understand areas
  • ~~[ ] I have made corresponding changes to the documentation~~
  • ~~[ ] I have made corresponding change to the default configuration files~~
  • [x] I have added tests that prove my fix is effective or that my feature works
  • ~~[ ] I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.~~

~~## Disruptive User Impact~~ ~~## Author's Checklist~~

How to test this PR locally

  1. Run Filebeat with the following filebeat.yml and modules.d/system.yml (adjust credentials/addresses as necessary)

    filebeat.yml

    filebeat.inputs:
      - type: journald
        id: my-journald-normal-input
        tags:
           - journald-input
      - type: filestream
        id: my-filestream-id
        paths:
          - /tmp/flog.log
    
    filebeat.config.modules:
      path: ${path.config}/modules.d/*.yml
      reload.enabled: false
      reload.period: 1s
    
    setup.template:
      settings:
        index.number_of_shards: 1
    
    setup.kibana:
      host: "http://kibana:5601"
      username: admin
      password: testing
      ssl.verification_mode: none
    
    output.elasticsearch:
      hosts: ["http://elasticsearch:9200"]
      preset: latency
      protocol: "http"
    
      username: admin
      password: testing
      ssl.verification_mode: none
    
    modules.d/system.yml

    - module: system
      syslog:
        enabled: true
        var.use_journald: true
        input:
          tags:
            - from-journald
    
      auth:
        enabled: true
        var.use_journald: true
        var.tags:
          - from-journald
    
  2. Go to Discover in Kibana, filter by tags: from-journald

  3. Look at fileset.name from the events, make sure auth and syslog are there

Related issues

  • Closes https://github.com/elastic/beats/issues/41378

~~## Use cases~~ ~~## Screenshots~~ ~~## Logs~~

belimawr avatar Oct 22 '24 20:10 belimawr

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

elasticmachine avatar Oct 22 '24 20:10 elasticmachine

This pull request is now in conflicts. Could you fix it? 🙏 To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b 41378-fix-system-module-only-starts-one-input upstream/41378-fix-system-module-only-starts-one-input
git merge upstream/main
git push upstream 41378-fix-system-module-only-starts-one-input

mergify[bot] avatar Oct 23 '24 19:10 mergify[bot]

Some CI failures seem to be unrelated:

<html>
<body>
<!--StartFragment-->
=== Failed
--
  | === FAIL: filebeat/input/filestream/internal/task TestGroup_Go/workloads_wait_for_available_worker (0.20s)
  | group_test.go:129: f2 started
  | group_test.go:116: f1 started
  | group_test.go:135: f2 done
  | group_test.go:148: f3 started
  | group_test.go:182: waiting the worker pool to finish all workloads
  | group_test.go:118: f1 done
  | group_test.go:150: f3 done
  | group_test.go:185: worker pool to finished all workloads
  | group_test.go:187:
  | Error Trace:	C:/buildkite-agent/builds/bk-agent-prod-gcp-1729720802417888488/elastic/filebeat/filebeat/input/filestream/internal/task/group_test.go:187
  | Error:      	Condition never satisfied
  | Test:       	TestGroup_Go/workloads_wait_for_available_worker
  | Messages:   	not all goroutines finished
  |  
  | === FAIL: filebeat/input/filestream/internal/task TestGroup_Go (0.70s)

<!--EndFragment-->
</body>
</html>

BK link: https://buildkite.com/elastic/filebeat/builds/10493#0192bb64-7d3a-4e8a-9879-e4528b7076ab/154-388

I tried to reproduce it on Win and did not succeed. I'll just re-run, if it fails again I'll open a FlakyTest issue.

belimawr avatar Oct 24 '24 14:10 belimawr

@belimawr This test looks suspiciously like the other ones I fixed recently https://github.com/elastic/beats/pull/41230. I wouldn't be surprised if it is flaky.

mauri870 avatar Oct 24 '24 15:10 mauri870

@belimawr This test looks suspiciously like the other ones I fixed recently #41230. I wouldn't be surprised if it is flaky.

I agree! However I didn't manage to reproduce it :/

That's way I posted the message here, it's a reminder of the error and links in case it happens again. ;)

belimawr avatar Oct 24 '24 16:10 belimawr

I agree! However I didn't manage to reproduce it :/

I forgot to reply, but I also tried to reproduce with x/tools/cmd/stress but had no luck.

mauri870 avatar Oct 28 '24 11:10 mauri870