beats
beats copied to clipboard
Update to go 1.22.4.
Proposed commit message
Update to the latest version of golang 1.22.4.
Checklist
- [x] My code follows the style guidelines of this project
- [x] I have commented my code, particularly in hard-to-understand areas
- ~~[ ] I have made corresponding changes to the documentation~~
- ~~[ ] I have made corresponding change to the default configuration files~~
- [x] I have added tests that prove my fix is effective or that my feature works
- [x] I have added an entry in
CHANGELOG.next.asciidocorCHANGELOG-developer.next.asciidoc.
Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)
This pull request is now in conflicts. Could you fix it? 🙏 To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/
git fetch upstream
git checkout -b go-1.22 upstream/go-1.22
git merge upstream/main
git push upstream go-1.22
CI seems unable to find the mage command for this change. I see that the CI job is installing golang version because its not present on the VM base image, that might be the issue and why mage is not present?
@elastic/ingest-eng-prod can someone please assist here with the golang version update?
CI seems unable to find the
magecommand for this change. I see that the CI job is installing golang version because its not present on the VM base image, that might be the issue and whymageis not present?
Indeed, the images falls back to installing the missing go, if not prebundled in cases of PR like this one, but we also need to install mage too. Raising a fix soon.
CI seems unable to find the
magecommand for this change. I see that the CI job is installing golang version because its not present on the VM base image, that might be the issue and whymageis not present?Indeed, the images falls back to installing the missing go, if not prebundled in cases of PR like this one, but we also need to install
magetoo. Raising a fix soon.
A new Windows 2022 image got built recently (see pr). I clicked retry to the heartbeat win 2022 job which seems to be moving forwards (mage got installed: https://buildkite.com/elastic/heartbeat/builds/6086#019078c4-30ea-4ba4-ac7d-30e353db62c0/108-109)
CI seems unable to find the
magecommand for this change. I see that the CI job is installing golang version because its not present on the VM base image, that might be the issue and whymageis not present?Indeed, the images falls back to installing the missing go, if not prebundled in cases of PR like this one, but we also need to install
magetoo. Raising a fix soon.A new Windows 2022 image got built recently (see pr). I clicked retry to the heartbeat win 2022 job which seems to be moving forwards (
magegot installed: https://buildkite.com/elastic/heartbeat/builds/6086#019078c4-30ea-4ba4-ac7d-30e353db62c0/108-109)
This has been successful, so I've since rebuilt all Windows custom Beats images to support mage installation as a fallback and clicked retry on all failed jobs ^^
I believe that go 1.22 adjusts how its represents a regex into a string. See commit for the changes I made to the tests, I don't see any really issues with these changes in the test. https://github.com/elastic/beats/pull/40082/commits/88964ad9dbf905f055cf31cdf5485746367824ce
Remaining error is related to x-pack/metricbeat, specifically with mssql and only that one module. Not clear what is going on to cause an error like that because of go 1.22.
Starting to wonder if its a timing issue, and now go 1.22 is faster to start causing the container to not be ready before it tries to perform the ping connection.
Latest commit removes the db.Ping from the NewConnection to see if that fixes the issue. https://github.com/elastic/beats/pull/40082/commits/221c519ec42c7e089723c4504958458dd4281947
I am worried that the db.Ping in the NewConnection is not the correct place for this. I would worry that it would prevent the metricset from being created in the case that the database is down, but then it could come back online and then the metricset would start working just fine. As it currently is I don't know if metricbeat would keep trying to create the metricset.
This needs to be backported to 7.17 so that it can continue to get security updates once Go 1.21 goes out of support. 8.14 can stay on 1.21.x, the 8.14.x series should end by the time Go 1.23 is released.
Remaining error is related to
x-pack/metricbeat, specifically withmssqland only that one module. Not clear what is going on to cause an error like that because of go 1.22.
For this test it looks like it is trying to connect to the mssql that is started via a docker container, but I don't see a healthcheck in the docker compose file. Is it possible that mssql just isn't ready yet and the go version change has changed timing such that we are noticing it now?
Remaining error is related to
x-pack/metricbeat, specifically withmssqland only that one module. Not clear what is going on to cause an error like that because of go 1.22.For this test it looks like it is trying to connect to the mssql that is started via a docker container, but I don't see a healthcheck in the docker compose file. Is it possible that mssql just isn't ready yet and the go version change has changed timing such that we are noticing it now?
That is probably the issue, but weird that it is only happening now. I am unable to find a change in the go 1.22 release notes that would explain it. My initial thought is possibly speed, faster start up time with go 1.22? Otherwise why do we get this error in go 1.22, but not go 1.21.
This pull request is now in conflicts. Could you fix it? 🙏 To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/
git fetch upstream
git checkout -b go-1.22 upstream/go-1.22
git merge upstream/main
git push upstream go-1.22
I have updated this PR to 1.22.5.
I also un-commented the code out from the mssql module, as that didn't seem to fix the issue. I also noticed that the mssql docker image does have a healthcheck defined https://github.com/elastic/beats/blob/main/x-pack/metricbeat/module/mssql/_meta/Dockerfile#L9
So still un-clear what is causing this to fail. Still digging...
I have updated this PR to 1.22.5.
I also un-commented the code out from the mssql module, as that didn't seem to fix the issue. I also noticed that the mssql docker image does have a healthcheck defined https://github.com/elastic/beats/blob/main/x-pack/metricbeat/module/mssql/_meta/Dockerfile#L9
So still un-clear what is causing this to fail. Still digging...
hey @blakerouse just thinking out loud here; the errors you are seeing with mssql remind me of TLS incompatibilities, I think go 1.22 default TLS was bumped. Could we disable encryption on mssql?! If you wanna test this suspicion out I think you should add encrypt=false in the mssql connection url
Hey @shmsr, with this PR we're trying to update Golang to 1.22.5 in the https://github.com/elastic/beats repository. As you can see from the past couple of comments and build failures, the mssql Metricbeat module's python integration tests are failing. We're not quite sure why the Golang upgrade might be affecting this module; any chance you might be able to take a look? Thanks.
Sure, @ycombinator!
@blakerouse @ycombinator x-pack/metricbeat *.py integration test is now green 🎉
I'll post the RCA in the morning.
RCA for why x-pack/metricbeat's modules - SQL and MSSQL integration tests were failing.
After debugging for a while and carefully examining the TLS error "TLS Handshake failed: Cannot read handshake packet: EOF," I initially believed it was related to https://github.com/Microsoft/msphpsql/issues/252 / https://learn.microsoft.com/en-gb/archive/blogs/sql_server_team/installing-sql-server-2017-for-linux-on-ubuntu-18-04-lts / https://stackoverflow.com/questions/57363561/golang-connection-to-sql-server-error-tls-handshake-failed-cannot-read-hands. However, after consulting this resource, I attempted changing the image and found that this resolved the issue. Nevertheless, it remains unclear why the problem occurs for Go 1.22.x but not for Go 1.21.x or lower. Additionally, the 2017-GA-ubuntu tag of the image has not been updated recently, remaining untouched for years.
Checked the database/sql changes in the Go stdlib, but there were no relevant changes between Go 1.21.x and Go 1.22.x in the related calls that could have caused this issue.
Going down the rabbit hole, noted the following things:
Follow this section (crypto/tls) of the release notes of Go 1.22:
By default, the minimum version offered by crypto/tls servers is now TLS 1.2 if not specified with config.MinimumVersion, matching the behavior of crypto/tls clients. This change can be reverted with the tls10server=1 GODEBUG setting.
By default, cipher suites without ECDHE support are no longer offered by either clients or servers during pre-TLS 1.3 handshakes. This change can be reverted with the tlsrsakex=1 GODEBUG setting.
So, the database client compiled with Go 1.22.x supports a minimum of TLS 1.2 and only supports cipher suites with ECDHE support (i.e., RSA-only cipher suites not supported) for pre-TLS 1.3 handshakes.
Also, see this.
So, running the containers for mcr.microsoft.com/mssql/server:2017-GA-ubuntu and mcr.microsoft.com/mssql/server:2017-CU31-GDR2-ubuntu-18.04, I noted this:
mcr.microsoft.com/mssql/server:2017-GA-ubuntu has ubuntu 16.04 bin and libs.
mcr.microsoft.com/mssql/server:2017-CU31-GDR2-ubuntu-18.04 has ubuntu 18.04 bin and libs.
- For
mcr.microsoft.com/mssql/server:2017-GA-ubuntu:
$ openssl version
OpenSSL 1.0.2g 1 Mar 2016
$ openssl ciphers -v | awk '{print $2}' | sort | uniq
SSLv3
TLSv1.2
Ok, good for now. TLS v1.2, checked.
Let's see the ciphers supported for the process listening on localhost:1433:
$ nmap --script ssl-enum-ciphers -p 1433 localhost
Starting Nmap 7.80 ( https://nmap.org/ ) at 2024-07-14 21:23 UTC
Nmap scan report for localhost (127.0.0.1)
Host is up (0.000090s latency).
PORT STATE SERVICE
1433/tcp open ms-sql-s
| ssl-enum-ciphers:
| TLSv1.0:
| ciphers:
| TLS_RSA_WITH_AES_128_CBC_SHA (rsa 2048) - A
| TLS_RSA_WITH_AES_256_CBC_SHA (rsa 2048) - A
| compressors:
| NULL
| cipher preference: client
| warnings:
| Forward Secrecy not supported by any cipher
| TLSv1.1:
| ciphers:
| TLS_RSA_WITH_AES_128_CBC_SHA (rsa 2048) - A
| TLS_RSA_WITH_AES_256_CBC_SHA (rsa 2048) - A
| compressors:
| NULL
| cipher preference: client
| warnings:
| Forward Secrecy not supported by any cipher
| TLSv1.2:
| ciphers:
| TLS_RSA_WITH_AES_128_CBC_SHA (rsa 2048) - A
| TLS_RSA_WITH_AES_128_CBC_SHA256 (rsa 2048) - A
| TLS_RSA_WITH_AES_128_GCM_SHA256 (rsa 2048) - A
| TLS_RSA_WITH_AES_256_CBC_SHA (rsa 2048) - A
| TLS_RSA_WITH_AES_256_CBC_SHA256 (rsa 2048) - A
| TLS_RSA_WITH_AES_256_GCM_SHA384 (rsa 2048) - A
| compressors:
| NULL
| cipher preference: client
| warnings:
| Forward Secrecy not supported by any cipher
|_ least strength: A
Nmap done: 1 IP address (1 host up) scanned in 0.16 seconds
See the ciphers? All TLS_RSA without ECDHE support.
- For
mcr.microsoft.com/mssql/server:2017-CU31-GDR2-ubuntu-18.04:
$ openssl version
OpenSSL 1.1.1 11 Sep 2018
$ openssl ciphers -v | awk '{print $2}' | sort | uniq
SSLv3
TLSv1
TLSv1.2
TLSv1.3
Let's see the ciphers supported for the process listening on localhost:1433:
$ nmap --script ssl-enum-ciphers -p 1433 localhost
Starting Nmap 7.80 ( https://nmap.org/ ) at 2024-07-14 21:26 UTC
Nmap scan report for localhost (127.0.0.1)
Host is up (0.000090s latency).
PORT STATE SERVICE
1433/tcp open ms-sql-s
| ssl-enum-ciphers:
| TLSv1.0:
| ciphers:
| TLS_RSA_WITH_AES_256_CBC_SHA (rsa 2048) - A
| TLS_RSA_WITH_AES_128_CBC_SHA (rsa 2048) - A
| compressors:
| NULL
| cipher preference: server
| warnings:
| Forward Secrecy not supported by any cipher
| TLSv1.1:
| ciphers:
| TLS_RSA_WITH_AES_256_CBC_SHA (rsa 2048) - A
| TLS_RSA_WITH_AES_128_CBC_SHA (rsa 2048) - A
| compressors:
| NULL
| cipher preference: server
| warnings:
| Forward Secrecy not supported by any cipher
| TLSv1.2:
| ciphers:
| TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (ecdh_x25519) - A
| TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (ecdh_x25519) - A
| TLS_RSA_WITH_AES_256_GCM_SHA384 (rsa 2048) - A
| TLS_RSA_WITH_AES_128_GCM_SHA256 (rsa 2048) - A
| TLS_RSA_WITH_AES_256_CBC_SHA256 (rsa 2048) - A
| TLS_RSA_WITH_AES_128_CBC_SHA256 (rsa 2048) - A
| TLS_RSA_WITH_AES_256_CBC_SHA (rsa 2048) - A
| TLS_RSA_WITH_AES_128_CBC_SHA (rsa 2048) - A
| compressors:
| NULL
| cipher preference: server
|_ least strength: A
Nmap done: 1 IP address (1 host up) scanned in 0.17 seconds
Okay, this supports ciphers with ECDHE support.
From this, you can understand why the tests were failing with the test binary compiled using Go 1.22.x.
cc: @blakerouse @ycombinator
Also for the regex changes in this PR, I think this is the related change in Go 1.22: https://github.com/golang/go/commit/98c9f271d67b501ecf2ce995539abd2cdc81d505
/test
Note we chatted with @oakrizan about the failing x-pack/metricbeat: AWS Tests and at least as of 4 days ago they used be successful: https://buildkite.com/elastic/beats-xpack-metricbeat/builds/5078#0190a2f4-ee9d-4b99-8311-924166b10128
So it's unclear whether they are affected by this PR, or something else that's changed during the last few days.
UPDATE: They aren't related to this PR; see https://github.com/elastic/beats/issues/40242
Additionally there appears to be general flakiness with macOS arm64 CI tests; usually the four automatic buildkite retries help, but lately seems they don't; the issue is tracked in https://github.com/elastic/beats/issues/40237
@shmsr Thank you so much for the help and detailed analysis on this issue, I really appreciate it.
I have updated the PR with latest main, lets see if it fully passes.
@pierrehilbert @cmacknz after chatting with @blakerouse we do need to backport this to 8.15 in order to align go versions in Agent and Beats. Shouldn't we bypass requirements?
I'm fine to force merge this and backport to 8.15, we just need the new conflict to be resolved first.
I have fixed the conflict, once we get green again lets force merge this.