beats icon indicating copy to clipboard operation
beats copied to clipboard

Update to go 1.22.4.

Open blakerouse opened this issue 1 year ago • 5 comments

Proposed commit message

Update to the latest version of golang 1.22.4.

Checklist

  • [x] My code follows the style guidelines of this project
  • [x] I have commented my code, particularly in hard-to-understand areas
  • ~~[ ] I have made corresponding changes to the documentation~~
  • ~~[ ] I have made corresponding change to the default configuration files~~
  • [x] I have added tests that prove my fix is effective or that my feature works
  • [x] I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

blakerouse avatar Jul 02 '24 22:07 blakerouse

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

elasticmachine avatar Jul 02 '24 22:07 elasticmachine

This pull request is now in conflicts. Could you fix it? 🙏 To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b go-1.22 upstream/go-1.22
git merge upstream/main
git push upstream go-1.22

mergify[bot] avatar Jul 02 '24 22:07 mergify[bot]

CI seems unable to find the mage command for this change. I see that the CI job is installing golang version because its not present on the VM base image, that might be the issue and why mage is not present?

blakerouse avatar Jul 03 '24 01:07 blakerouse

@elastic/ingest-eng-prod can someone please assist here with the golang version update?

jlind23 avatar Jul 03 '24 04:07 jlind23

CI seems unable to find the mage command for this change. I see that the CI job is installing golang version because its not present on the VM base image, that might be the issue and why mage is not present?

Indeed, the images falls back to installing the missing go, if not prebundled in cases of PR like this one, but we also need to install mage too. Raising a fix soon.

dliappis avatar Jul 03 '24 12:07 dliappis

CI seems unable to find the mage command for this change. I see that the CI job is installing golang version because its not present on the VM base image, that might be the issue and why mage is not present?

Indeed, the images falls back to installing the missing go, if not prebundled in cases of PR like this one, but we also need to install mage too. Raising a fix soon.

A new Windows 2022 image got built recently (see pr). I clicked retry to the heartbeat win 2022 job which seems to be moving forwards (mage got installed: https://buildkite.com/elastic/heartbeat/builds/6086#019078c4-30ea-4ba4-ac7d-30e353db62c0/108-109)

dliappis avatar Jul 03 '24 13:07 dliappis

CI seems unable to find the mage command for this change. I see that the CI job is installing golang version because its not present on the VM base image, that might be the issue and why mage is not present?

Indeed, the images falls back to installing the missing go, if not prebundled in cases of PR like this one, but we also need to install mage too. Raising a fix soon.

A new Windows 2022 image got built recently (see pr). I clicked retry to the heartbeat win 2022 job which seems to be moving forwards (mage got installed: https://buildkite.com/elastic/heartbeat/builds/6086#019078c4-30ea-4ba4-ac7d-30e353db62c0/108-109)

This has been successful, so I've since rebuilt all Windows custom Beats images to support mage installation as a fallback and clicked retry on all failed jobs ^^

dliappis avatar Jul 03 '24 14:07 dliappis

I believe that go 1.22 adjusts how its represents a regex into a string. See commit for the changes I made to the tests, I don't see any really issues with these changes in the test. https://github.com/elastic/beats/pull/40082/commits/88964ad9dbf905f055cf31cdf5485746367824ce

blakerouse avatar Jul 03 '24 15:07 blakerouse

Remaining error is related to x-pack/metricbeat, specifically with mssql and only that one module. Not clear what is going on to cause an error like that because of go 1.22.

blakerouse avatar Jul 03 '24 19:07 blakerouse

Starting to wonder if its a timing issue, and now go 1.22 is faster to start causing the container to not be ready before it tries to perform the ping connection.

blakerouse avatar Jul 03 '24 19:07 blakerouse

Latest commit removes the db.Ping from the NewConnection to see if that fixes the issue. https://github.com/elastic/beats/pull/40082/commits/221c519ec42c7e089723c4504958458dd4281947

I am worried that the db.Ping in the NewConnection is not the correct place for this. I would worry that it would prevent the metricset from being created in the case that the database is down, but then it could come back online and then the metricset would start working just fine. As it currently is I don't know if metricbeat would keep trying to create the metricset.

blakerouse avatar Jul 03 '24 19:07 blakerouse

This needs to be backported to 7.17 so that it can continue to get security updates once Go 1.21 goes out of support. 8.14 can stay on 1.21.x, the 8.14.x series should end by the time Go 1.23 is released.

cmacknz avatar Jul 03 '24 20:07 cmacknz

Remaining error is related to x-pack/metricbeat, specifically with mssql and only that one module. Not clear what is going on to cause an error like that because of go 1.22.

For this test it looks like it is trying to connect to the mssql that is started via a docker container, but I don't see a healthcheck in the docker compose file. Is it possible that mssql just isn't ready yet and the go version change has changed timing such that we are noticing it now?

leehinman avatar Jul 05 '24 15:07 leehinman

Remaining error is related to x-pack/metricbeat, specifically with mssql and only that one module. Not clear what is going on to cause an error like that because of go 1.22.

For this test it looks like it is trying to connect to the mssql that is started via a docker container, but I don't see a healthcheck in the docker compose file. Is it possible that mssql just isn't ready yet and the go version change has changed timing such that we are noticing it now?

That is probably the issue, but weird that it is only happening now. I am unable to find a change in the go 1.22 release notes that would explain it. My initial thought is possibly speed, faster start up time with go 1.22? Otherwise why do we get this error in go 1.22, but not go 1.21.

blakerouse avatar Jul 05 '24 16:07 blakerouse

This pull request is now in conflicts. Could you fix it? 🙏 To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b go-1.22 upstream/go-1.22
git merge upstream/main
git push upstream go-1.22

mergify[bot] avatar Jul 05 '24 18:07 mergify[bot]

I have updated this PR to 1.22.5.

I also un-commented the code out from the mssql module, as that didn't seem to fix the issue. I also noticed that the mssql docker image does have a healthcheck defined https://github.com/elastic/beats/blob/main/x-pack/metricbeat/module/mssql/_meta/Dockerfile#L9

So still un-clear what is causing this to fail. Still digging...

blakerouse avatar Jul 08 '24 19:07 blakerouse

I have updated this PR to 1.22.5.

I also un-commented the code out from the mssql module, as that didn't seem to fix the issue. I also noticed that the mssql docker image does have a healthcheck defined https://github.com/elastic/beats/blob/main/x-pack/metricbeat/module/mssql/_meta/Dockerfile#L9

So still un-clear what is causing this to fail. Still digging...

hey @blakerouse just thinking out loud here; the errors you are seeing with mssql remind me of TLS incompatibilities, I think go 1.22 default TLS was bumped. Could we disable encryption on mssql?! If you wanna test this suspicion out I think you should add encrypt=false in the mssql connection url

pkoutsovasilis avatar Jul 08 '24 19:07 pkoutsovasilis

Hey @shmsr, with this PR we're trying to update Golang to 1.22.5 in the https://github.com/elastic/beats repository. As you can see from the past couple of comments and build failures, the mssql Metricbeat module's python integration tests are failing. We're not quite sure why the Golang upgrade might be affecting this module; any chance you might be able to take a look? Thanks.

ycombinator avatar Jul 12 '24 17:07 ycombinator

Sure, @ycombinator!

shmsr avatar Jul 12 '24 21:07 shmsr

@blakerouse @ycombinator x-pack/metricbeat *.py integration test is now green 🎉

I'll post the RCA in the morning.

shmsr avatar Jul 14 '24 21:07 shmsr

RCA for why x-pack/metricbeat's modules - SQL and MSSQL integration tests were failing.

After debugging for a while and carefully examining the TLS error "TLS Handshake failed: Cannot read handshake packet: EOF," I initially believed it was related to https://github.com/Microsoft/msphpsql/issues/252 / https://learn.microsoft.com/en-gb/archive/blogs/sql_server_team/installing-sql-server-2017-for-linux-on-ubuntu-18-04-lts / https://stackoverflow.com/questions/57363561/golang-connection-to-sql-server-error-tls-handshake-failed-cannot-read-hands. However, after consulting this resource, I attempted changing the image and found that this resolved the issue. Nevertheless, it remains unclear why the problem occurs for Go 1.22.x but not for Go 1.21.x or lower. Additionally, the 2017-GA-ubuntu tag of the image has not been updated recently, remaining untouched for years.

Checked the database/sql changes in the Go stdlib, but there were no relevant changes between Go 1.21.x and Go 1.22.x in the related calls that could have caused this issue.

Going down the rabbit hole, noted the following things:

Follow this section (crypto/tls) of the release notes of Go 1.22:

By default, the minimum version offered by crypto/tls servers is now TLS 1.2 if not specified with config.MinimumVersion, matching the behavior of crypto/tls clients. This change can be reverted with the tls10server=1 GODEBUG setting.

By default, cipher suites without ECDHE support are no longer offered by either clients or servers during pre-TLS 1.3 handshakes. This change can be reverted with the tlsrsakex=1 GODEBUG setting.

So, the database client compiled with Go 1.22.x supports a minimum of TLS 1.2 and only supports cipher suites with ECDHE support (i.e., RSA-only cipher suites not supported) for pre-TLS 1.3 handshakes.

Also, see this.


So, running the containers for mcr.microsoft.com/mssql/server:2017-GA-ubuntu and mcr.microsoft.com/mssql/server:2017-CU31-GDR2-ubuntu-18.04, I noted this:

mcr.microsoft.com/mssql/server:2017-GA-ubuntu has ubuntu 16.04 bin and libs. mcr.microsoft.com/mssql/server:2017-CU31-GDR2-ubuntu-18.04 has ubuntu 18.04 bin and libs.

  • For mcr.microsoft.com/mssql/server:2017-GA-ubuntu:
$ openssl version
OpenSSL 1.0.2g  1 Mar 2016

$ openssl ciphers -v | awk '{print $2}' | sort | uniq
SSLv3
TLSv1.2

Ok, good for now. TLS v1.2, checked.

Let's see the ciphers supported for the process listening on localhost:1433:

$ nmap --script ssl-enum-ciphers -p 1433 localhost
Starting Nmap 7.80 ( https://nmap.org/ ) at 2024-07-14 21:23 UTC
Nmap scan report for localhost (127.0.0.1)
Host is up (0.000090s latency).

PORT     STATE SERVICE
1433/tcp open  ms-sql-s
| ssl-enum-ciphers:
|   TLSv1.0:
|     ciphers:
|       TLS_RSA_WITH_AES_128_CBC_SHA (rsa 2048) - A
|       TLS_RSA_WITH_AES_256_CBC_SHA (rsa 2048) - A
|     compressors:
|       NULL
|     cipher preference: client
|     warnings:
|       Forward Secrecy not supported by any cipher
|   TLSv1.1:
|     ciphers:
|       TLS_RSA_WITH_AES_128_CBC_SHA (rsa 2048) - A
|       TLS_RSA_WITH_AES_256_CBC_SHA (rsa 2048) - A
|     compressors:
|       NULL
|     cipher preference: client
|     warnings:
|       Forward Secrecy not supported by any cipher
|   TLSv1.2:
|     ciphers:
|       TLS_RSA_WITH_AES_128_CBC_SHA (rsa 2048) - A
|       TLS_RSA_WITH_AES_128_CBC_SHA256 (rsa 2048) - A
|       TLS_RSA_WITH_AES_128_GCM_SHA256 (rsa 2048) - A
|       TLS_RSA_WITH_AES_256_CBC_SHA (rsa 2048) - A
|       TLS_RSA_WITH_AES_256_CBC_SHA256 (rsa 2048) - A
|       TLS_RSA_WITH_AES_256_GCM_SHA384 (rsa 2048) - A
|     compressors:
|       NULL
|     cipher preference: client
|     warnings:
|       Forward Secrecy not supported by any cipher
|_  least strength: A

Nmap done: 1 IP address (1 host up) scanned in 0.16 seconds

See the ciphers? All TLS_RSA without ECDHE support.

  • For mcr.microsoft.com/mssql/server:2017-CU31-GDR2-ubuntu-18.04:
$ openssl version
OpenSSL 1.1.1  11 Sep 2018

$ openssl ciphers -v | awk '{print $2}' | sort | uniq
SSLv3
TLSv1
TLSv1.2
TLSv1.3

Let's see the ciphers supported for the process listening on localhost:1433:

$ nmap --script ssl-enum-ciphers -p 1433 localhost
Starting Nmap 7.80 ( https://nmap.org/ ) at 2024-07-14 21:26 UTC
Nmap scan report for localhost (127.0.0.1)
Host is up (0.000090s latency).

PORT     STATE SERVICE
1433/tcp open  ms-sql-s
| ssl-enum-ciphers:
|   TLSv1.0:
|     ciphers:
|       TLS_RSA_WITH_AES_256_CBC_SHA (rsa 2048) - A
|       TLS_RSA_WITH_AES_128_CBC_SHA (rsa 2048) - A
|     compressors:
|       NULL
|     cipher preference: server
|     warnings:
|       Forward Secrecy not supported by any cipher
|   TLSv1.1:
|     ciphers:
|       TLS_RSA_WITH_AES_256_CBC_SHA (rsa 2048) - A
|       TLS_RSA_WITH_AES_128_CBC_SHA (rsa 2048) - A
|     compressors:
|       NULL
|     cipher preference: server
|     warnings:
|       Forward Secrecy not supported by any cipher
|   TLSv1.2:
|     ciphers:
|       TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (ecdh_x25519) - A
|       TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (ecdh_x25519) - A
|       TLS_RSA_WITH_AES_256_GCM_SHA384 (rsa 2048) - A
|       TLS_RSA_WITH_AES_128_GCM_SHA256 (rsa 2048) - A
|       TLS_RSA_WITH_AES_256_CBC_SHA256 (rsa 2048) - A
|       TLS_RSA_WITH_AES_128_CBC_SHA256 (rsa 2048) - A
|       TLS_RSA_WITH_AES_256_CBC_SHA (rsa 2048) - A
|       TLS_RSA_WITH_AES_128_CBC_SHA (rsa 2048) - A
|     compressors:
|       NULL
|     cipher preference: server
|_  least strength: A

Nmap done: 1 IP address (1 host up) scanned in 0.17 seconds

Okay, this supports ciphers with ECDHE support.

From this, you can understand why the tests were failing with the test binary compiled using Go 1.22.x.

cc: @blakerouse @ycombinator

shmsr avatar Jul 15 '24 05:07 shmsr

Also for the regex changes in this PR, I think this is the related change in Go 1.22: https://github.com/golang/go/commit/98c9f271d67b501ecf2ce995539abd2cdc81d505

shmsr avatar Jul 15 '24 06:07 shmsr

/test

oakrizan avatar Jul 15 '24 08:07 oakrizan

Note we chatted with @oakrizan about the failing x-pack/metricbeat: AWS Tests and at least as of 4 days ago they used be successful: https://buildkite.com/elastic/beats-xpack-metricbeat/builds/5078#0190a2f4-ee9d-4b99-8311-924166b10128

So it's unclear whether they are affected by this PR, or something else that's changed during the last few days.

UPDATE: They aren't related to this PR; see https://github.com/elastic/beats/issues/40242

dliappis avatar Jul 15 '24 10:07 dliappis

Additionally there appears to be general flakiness with macOS arm64 CI tests; usually the four automatic buildkite retries help, but lately seems they don't; the issue is tracked in https://github.com/elastic/beats/issues/40237

dliappis avatar Jul 15 '24 10:07 dliappis

Thanks for your help and the detailed RCA, @shmsr. Back over to you @blakerouse.

ycombinator avatar Jul 15 '24 21:07 ycombinator

@shmsr Thank you so much for the help and detailed analysis on this issue, I really appreciate it.

I have updated the PR with latest main, lets see if it fully passes.

blakerouse avatar Jul 22 '24 19:07 blakerouse

@pierrehilbert @cmacknz after chatting with @blakerouse we do need to backport this to 8.15 in order to align go versions in Agent and Beats. Shouldn't we bypass requirements?

jlind23 avatar Jul 23 '24 14:07 jlind23

I'm fine to force merge this and backport to 8.15, we just need the new conflict to be resolved first.

cmacknz avatar Jul 23 '24 17:07 cmacknz

I have fixed the conflict, once we get green again lets force merge this.

blakerouse avatar Jul 23 '24 17:07 blakerouse