beats icon indicating copy to clipboard operation
beats copied to clipboard

New metricbeat module: pgbouncer

Open manuelsaks opened this issue 1 year ago • 29 comments

  • WHAT: The module follows the standard design patterns of Metricbeat modules, ensuring consistency with other database modules like PostgreSQL. It implements the MetricSet interface to collect metrics from PgBouncer. The module interacts with PgBouncer using SQL queries to gather pool statistics and server performance data. The data is parsed and processed through Metricbeat's internal processing pipeline. It communicates with PgBouncer using the native Postgres protocol over TCP. It sends queries to fetch relevant metrics and processes the responses, converting them into Metricbeat-compatible events.
  • WHY: PGBouncer is widely used as a connection pooler for PostgreSQL, and monitoring its performance is crucial for maintaining the health of database-backed systems. By adding this module, Metricbeat provides users with out-of-the-box monitoring capabilities for PGBouncer, enabling them to track metrics such as connection pool utilization, server performance, and query rates. This integration helps users maintain optimal database performance, reduce latency, and prevent resource exhaustion in their systems. -->

Checklist

  • [x] My code follows the style guidelines of this project
  • [x] I have commented my code, particularly in hard-to-understand areas
  • [x] I have made corresponding changes to the documentation
  • [x] I have made corresponding change to the default configuration files
  • [x] I have added tests that prove my fix is effective or that my feature works
  • [ ] I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Disruptive User Impact

It shouldn't impact users.

Author's Checklist

  • [ ] Code follows the established coding guidelines and style.
  • [ ] Proper error handling is in place.
  • [ ] Tests passing and the coverage > 80%.
  • [ ] Code builds as expected.
  • [ ] Data collection is efficient, and no unnecessary queries are run.
  • [ ] Documentation is updated to reflect the new module, including how to configure and use it.

How to test this PR locally

  1. Run the docker-compose stack and wait until you will be able to reach kibana on localhost:5601
docker-compose -f ./metricbeat/module/pgbouncer/docker-compose.yml up -d
  1. Run the metricbeat agent:
go run ./metricbeat/main.go -c ./metricbeat/module/pgbouncer/_meta/config_local.yml
  1. Go to kibana panel, create Data View with the test pattern and check logs in discover.

manuelsaks avatar Oct 08 '24 13:10 manuelsaks

💚 CLA has been signed

This pull request is now in conflicts. Could you fix it? 🙏 To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b module_pgbouncer upstream/module_pgbouncer
git merge upstream/main
git push upstream module_pgbouncer

mergify[bot] avatar Oct 08 '24 13:10 mergify[bot]

This pull request does not have a backport label. If this is a bug or security fix, could you label this PR @manuelsaks? 🙏. For such, you'll need to label your PR with:

  • The upcoming major version of the Elastic Stack
  • The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed branches, such as:

  • backport-8./d is the label to automatically backport to the 8./d branch. /d is the digit

mergify[bot] avatar Oct 08 '24 13:10 mergify[bot]

backport-8.x has been added to help with the transition to the new branch 8.x. If you don't need it please use backport-skip label and remove the backport-8.x label.

mergify[bot] avatar Oct 08 '24 13:10 mergify[bot]

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

elasticmachine avatar Oct 08 '24 14:10 elasticmachine

Thanks @manuelsaks for your contribution. As a first step before we are reviewing your work, could you please sign our CLA?

pierrehilbert avatar Oct 08 '24 14:10 pierrehilbert

@lalit-satapathy as this is related to PostgreSQL, would you mind assigning someone from your team for the reviewing part?

pierrehilbert avatar Oct 08 '24 14:10 pierrehilbert

Thanks @manuelsaks for your contribution. As a first step before we are reviewing your work, could you please sign our CLA?

done

manuelsaks avatar Oct 08 '24 14:10 manuelsaks

Thanks for the quick action. We now have a conflict with the go.mod and go.sum files so if you can fix them it would be perfect to allow tests to be running on your PR.

pierrehilbert avatar Oct 08 '24 14:10 pierrehilbert

/test

pierrehilbert avatar Oct 09 '24 05:10 pierrehilbert

run docs-build

pierrehilbert avatar Oct 09 '24 05:10 pierrehilbert

@manuelsaks it looks like your tests are failing, could you please fix them?

=== FAIL: metricbeat/module/pgbouncer TestQueryStats (5.66s)

pierrehilbert avatar Oct 09 '24 16:10 pierrehilbert

@manuelsaks it looks like your tests are failing, could you please fix them?

=== FAIL: metricbeat/module/pgbouncer TestQueryStats (5.66s)

I don't have an access to the buildkite logs. What is exactly wrong? Because it's working locally.

Running tool: C:\Program Files\Go\bin\go.exe test -timeout 30s -run ^TestQueryStats$ github.com/elastic/beats/v7/metricbeat/module/pgbouncer
ok  	github.com/elastic/beats/v7/metricbeat/module/pgbouncer	8.825s

manuelsaks avatar Oct 09 '24 17:10 manuelsaks

@manuelsaks Thanks for willing to look into the CI issue.

I spotted this error in the logs:

ERROR: for pgbouncer  Cannot start service pgbouncer: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error mounting "/go/src/github.com/elastic/beats/metricbeat/module/pgbouncer/_meta/userlist.txt" to rootfs at "/etc/pgbouncer/userlist.txt": create mount destination for /etc/pgbouncer/userlist.txt mount: cannot mkdir in /var/lib/docker/overlay2/3883fa799c9c240081c07e9928532040d78e2836a7d53843d8f77634ca22bbc5/merged/etc/pgbouncer/userlist.txt: not a directory: unknown: Are you trying to mount a directory onto a file (or vice-versa)? Check if the specified host path exists and is the expected type
Encountered errors while bringing up the project.

mauri870 avatar Oct 09 '24 19:10 mauri870

@manuelsaks Thanks for willing to look into the CI issue.

I spotted this error in the logs:

ERROR: for pgbouncer  Cannot start service pgbouncer: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error mounting "/go/src/github.com/elastic/beats/metricbeat/module/pgbouncer/_meta/userlist.txt" to rootfs at "/etc/pgbouncer/userlist.txt": create mount destination for /etc/pgbouncer/userlist.txt mount: cannot mkdir in /var/lib/docker/overlay2/3883fa799c9c240081c07e9928532040d78e2836a7d53843d8f77634ca22bbc5/merged/etc/pgbouncer/userlist.txt: not a directory: unknown: Are you trying to mount a directory onto a file (or vice-versa)? Check if the specified host path exists and is the expected type
Encountered errors while bringing up the project.

I successfully recreated the issue in my own temporary buildkite pipeline and confirmed that it should be resolved with the commit 5e8158d.

manuelsaks avatar Oct 10 '24 14:10 manuelsaks

/test

pierrehilbert avatar Oct 11 '24 07:10 pierrehilbert

run docs-build

pierrehilbert avatar Oct 11 '24 07:10 pierrehilbert

Can you provide me details/logs about those new errors?

manuelsaks avatar Oct 11 '24 08:10 manuelsaks

Every tests are failing on the windows runner for the same reason:

=== Failed
--
  | === FAIL: metricbeat/module/pgbouncer TestDBConnection (0.31s)
  | metricset_test.go:94: failed to start service 'pgbouncer': ps: failed to get container list: failed to get list of service names: exec: "docker-compose": executable file not found in %PATH%
  | metricset_test.go:94: kill container error: exec: "docker-compose": executable file not found in %PATH%
  | metricset_test.go:94: remove container error: exec: "docker-compose": executable file not found in %PATH%
  | metricset_test.go:94: failed to start service 'pgbouncer': ps: failed to get container list: failed to get list of service names: exec: "docker-compose": executable file not found in %PATH%
  | metricset_test.go:94: kill container error: exec: "docker-compose": executable file not found in %PATH%
  | metricset_test.go:94: remove container error: exec: "docker-compose": executable file not found in %PATH%
  | metricset_test.go:94: failed to start service 'pgbouncer': ps: failed to get container list: failed to get list of service names: exec: "docker-compose": executable file not found in %PATH%
  | metricset_test.go:94: kill container error: exec: "docker-compose": executable file not found in %PATH%
  | metricset_test.go:94: remove container error: exec: "docker-compose": executable file not found in %PATH%
  | metricset_test.go:94: getting host for pgbouncer: ps: failed to get container list: failed to get list of service names: exec: "docker-compose": executable file not found in %PATH%

pierrehilbert avatar Oct 11 '24 08:10 pierrehilbert

Every tests are failing on the windows runner for the same reason:

=== Failed
--
  | === FAIL: metricbeat/module/pgbouncer TestDBConnection (0.31s)
  | metricset_test.go:94: failed to start service 'pgbouncer': ps: failed to get container list: failed to get list of service names: exec: "docker-compose": executable file not found in %PATH%
  | metricset_test.go:94: kill container error: exec: "docker-compose": executable file not found in %PATH%
  | metricset_test.go:94: remove container error: exec: "docker-compose": executable file not found in %PATH%
  | metricset_test.go:94: failed to start service 'pgbouncer': ps: failed to get container list: failed to get list of service names: exec: "docker-compose": executable file not found in %PATH%
  | metricset_test.go:94: kill container error: exec: "docker-compose": executable file not found in %PATH%
  | metricset_test.go:94: remove container error: exec: "docker-compose": executable file not found in %PATH%
  | metricset_test.go:94: failed to start service 'pgbouncer': ps: failed to get container list: failed to get list of service names: exec: "docker-compose": executable file not found in %PATH%
  | metricset_test.go:94: kill container error: exec: "docker-compose": executable file not found in %PATH%
  | metricset_test.go:94: remove container error: exec: "docker-compose": executable file not found in %PATH%
  | metricset_test.go:94: getting host for pgbouncer: ps: failed to get container list: failed to get list of service names: exec: "docker-compose": executable file not found in %PATH%

I missed adding build tags, which caused the unit test pipelines to also run integration tests. This failed on Windows Server agents without docker-compose.

I've added the necessary build tags and successfully tested with commit https://github.com/elastic/beats/pull/41174/commits/fdd7d0f9b03b9c16a71c43d1934e4abe0b05834c on Windows Server 2022.

manuelsaks avatar Oct 11 '24 11:10 manuelsaks

/test

pierrehilbert avatar Oct 11 '24 13:10 pierrehilbert

I'm not sure if the error from the x-pack/metricbeat: Python Integration Tests pipeline is related to my changes.

https://buildkite.com/elastic/beats-xpack-metricbeat/builds/7926#01927bde-dd3c-4a49-ad9b-c828f1e5cfc6

I merged from main to have the latest changes.

manuelsaks avatar Oct 11 '24 15:10 manuelsaks

/test

pierrehilbert avatar Oct 11 '24 15:10 pierrehilbert

/test

pierrehilbert avatar Oct 13 '24 12:10 pierrehilbert

We are moving in the right direction, one last error:

__________________________________________________________________ Test.test_index_management __________________________________________________________________
--
  |  
  | self = <test_xpack_base.Test testMethod=test_index_management>
  |  
  | @unittest.skipUnless(INTEGRATION_TESTS, "integration test")
  | def test_index_management(self):
  | """
  | Test that the template can be loaded with `setup --index-management`
  | """
  | es = Elasticsearch([self.get_elasticsearch_url()])
  | self.render_config_template(
  | modules=[{
  | "name": "apache",
  | "metricsets": ["status"],
  | "hosts": ["localhost"],
  | }],
  | elasticsearch={"host": self.get_elasticsearch_url()},
  | )
  | exit_code = self.run_beat(extra_args=["setup", "--index-management", "-E", "setup.template.overwrite=true"])
  |  
  | >       assert exit_code == 0
  | E       assert 1 == 0
  |  
  | ../../metricbeat/tests/system/test_base.py:57: AssertionError

pierrehilbert avatar Oct 14 '24 07:10 pierrehilbert

/test

pierrehilbert avatar Oct 14 '24 10:10 pierrehilbert

The problem is:

{
  "error": {
    "root_cause": [
      {
        "type": "illegal_argument_exception",
        "reason": "composable template [metricbeat-9.0.0] template after composition is invalid"
      }
    ],
    "type": "illegal_argument_exception",
    "reason": "composable template [metricbeat-9.0.0] template after composition is invalid",
    "caused_by": {
      "type": "illegal_argument_exception",
      "reason": "invalid composite mappings for [metricbeat-9.0.0]",
      "caused_by": {
        "type": "illegal_argument_exception",
        "reason": "Limit of total fields [10000] has been exceeded"
      }
    }
  },
  "status": 400
}

This occurs when running the test_index_management function in beats/metricbeat/tests/system/test_base.py:

def test_index_management(self):
    """
    Test that the template can be loaded with `setup --index-management`
    """
    es = Elasticsearch([self.get_elasticsearch_url()])
    self.render_config_template(
        modules=[{
            "name": "apache",
            "metricsets": ["status"],
            "hosts": ["localhost"],
        }],
        elasticsearch={"host": self.get_elasticsearch_url()},
    )
    exit_code = self.run_beat(extra_args=["setup", "--index-management", "-E", "setup.template.overwrite=true"])

    assert exit_code == 0
    assert self.log_contains('Loaded index template')
    assert len(es.cat.templates(name='metricbeat-*', h='name')) > 0

If I increase the field limit from the default: 10000:

exit_code = self.run_beat(extra_args=["setup", "--index-management", "-E", "setup.template.overwrite=true"])

to 10090:

exit_code = self.run_beat(extra_args=["setup", "--index-management", "-E", "setup.template.overwrite=true", "-E", "setup.template.settings.index.mapping.total_fields.limit=10090"])

the test passes.

Please advise what should I do now.

manuelsaks avatar Oct 14 '24 10:10 manuelsaks

/test

pierrehilbert avatar Oct 21 '24 13:10 pierrehilbert

/test

pierrehilbert avatar Oct 22 '24 13:10 pierrehilbert

/test

pierrehilbert avatar Nov 06 '24 16:11 pierrehilbert