graylog2-server icon indicating copy to clipboard operation
graylog2-server copied to clipboard

Graylog 4.3 with Opensearch 1.3.2 - ES Version check cant be disabled.

Open cyberkryption opened this issue 2 years ago • 13 comments

I am installing the following on a Centos 7 box.

OpenJDK-11 Opensearch-1.3.2 Graylog 4.3.1 via

Expected Behavior

The version check in the server.conf should disable the ElasticSearch version check preventing the hostname verification issue as the ssl verification appears to be part of Elasticsearch version check. Does Opensearch 1.3.2 work with Graylog 4.3.1 as per https://docs.graylog.org/docs/installing-opensearch

Current Behavior

Version check happens preventing Graylog from starting

Steps to Reproduce (for bugs)

  1. Install Centos 7
  2. Install OpenJDK-11
  3. Install Opensearch via rpm.
  4. Setup certificates using https://opensearch.org/docs/latest/security-plugin/configuration/generate-certificates/
  5. Make sure certificates are imported into Java trust store and add root certs etc to Centos. Update root CA's.
  6. Opensearch.yml configuration at https://pastebin.com/vPMCkm9f and get OpenSearch to report ES version not OpenSearch version.
  7. Confirm with openssl s_client Excert
Acceptable client certificate CA names
/C=GB/ST=Jersey/L=St.Helier/O=DefenceLogic/OU=Security/CN=ROOT
Client Certificate Types: ECDSA sign, RSA sign, DSA sign
Requested Signature Algorithms: 0x07+0x08:0x08+0x08:ECDSA+SHA256:ECDSA+SHA384:ECDSA+SHA512:0x04+0x08:0x05+0x08:0x06+0x08:0x09+0x08:0x0A+0x08:0x0B+0x08:RSA+SHA256:RSA+SHA384:RSA+SHA512:DSA+SHA256:ECDSA+SHA224:RSA+SHA224:DSA+SHA224:ECDSA+SHA1:RSA+SHA1:DSA+SHA1
Shared Requested Signature Algorithms: ECDSA+SHA256:ECDSA+SHA384:ECDSA+SHA512:RSA+SHA256:RSA+SHA384:RSA+SHA512:DSA+SHA256:ECDSA+SHA224:RSA+SHA224:DSA+SHA224:ECDSA+SHA1:RSA+SHA1:DSA+SHA1
Peer signing digest: SHA512
Server Temp Key: ECDH, P-256, 256 bits
---
SSL handshake has read 1530 bytes and written 427 bytes
---
New, TLSv1/SSLv3, Cipher is ECDHE-RSA-AES256-GCM-SHA384
Server public key is 2048 bit
Secure Renegotiation IS supported
Compression: NONE
Expansion: NONE
No ALPN negotiated
SSL-Session:
    Protocol  : TLSv1.2
    Cipher    : ECDHE-RSA-AES256-GCM-SHA384
    Session-ID: 7DBB1ED24854DB4F7959EB151500F48F5F825B7CE9ADEDC417526B57390C3F4C
    Session-ID-ctx:
    Master-Key: D5871EF87078B84697CC0D555AFFBB116D07057D927D9425220E89E37832C940C4FE5808909F0275ED5C77F45F8FB19B
    Key-Arg   : None
    Krb5 Principal: None
    PSK identity: None
    PSK identity hint: None
    Start Time: 1655231599
    Timeout   : 300 (sec)
    Verify return code: 0 (ok)
---

  1. Setup graylog server .conf as shown at https://pastebin.com/0SKJKydE
  2. Setting elasticsearch_disable_version_check = true or to elasticsearch_disable_version_check = false has no effect

Context

Excert from server log.

2022-06-14T19:30:33.662+01:00 INFO  [cluster] Cluster created with settings {hosts=[localhost:27017], mode=SINGLE, requiredClusterType=UNKNOWN,                                               serverSelectionTimeout='30000 ms', maxWaitQueueSize=5000}
2022-06-14T19:30:33.704+01:00 INFO  [cluster] Cluster description not yet available. Waiting for 30000 ms before timing out
2022-06-14T19:30:33.730+01:00 INFO  [connection] Opened connection [connectionId{localValue:1, serverValue:1}] to localhost:27017
2022-06-14T19:30:33.737+01:00 INFO  [cluster] Monitor thread successfully connected to server with description ServerDescription{address=localho                                              st:27017, type=STANDALONE, state=CONNECTED, ok=true, version=ServerVersion{versionList=[4, 2, 20]}, minWireVersion=0, maxWireVersion=8, maxDocum                                              entSize=16777216, logicalSessionTimeoutMinutes=30, roundTripTimeNanos=2739143}
2022-06-14T19:30:33.749+01:00 INFO  [connection] Opened connection [connectionId{localValue:2, serverValue:2}] to localhost:27017
2022-06-14T19:30:33.775+01:00 INFO  [connection] Closed connection [connectionId{localValue:2, serverValue:2}] to localhost:27017 because the po                                              ol has been closed.
2022-06-14T19:30:33.777+01:00 INFO  [MongoDBPreflightCheck] Connected to MongoDB version 4.2.20
2022-06-14T19:30:34.029+01:00 ERROR [VersionProbe] Unable to retrieve version from Elasticsearch node: Hostname opensearch.cyberkryption.local n                                              ot verified:
    certificate: sha256/KeB6DfLCNFq3561pPhy8Zc/+oU6pmSySnrPyzHbwfvQ=
    DN: CN=opensearch.cyberkryption.local, OU=Security, O=DefenceLogic, L=St.Helier, ST=Jersey, C=GB
    subjectAltNames: []. - Hostname opensearch.cyberkryption.local not verified:
    certificate: sha256/KeB6DfLCNFq3561pPhy8Zc/+oU6pmSySnrPyzHbwfvQ=
    DN: CN=opensearch.cyberkryption.local, OU=Security, O=DefenceLogic, L=St.Helier, ST=Jersey, C=GB
    subjectAltNames: [].

Your Environment

  • Graylog Version: 4.3.1
  • Java Version: OpenJDK-11
  • Opensearch Version: 1.3.2
  • MongoDB Version: 4.2.20-1
  • Operating System: Centos 7
  • Browser version: n/a

cyberkryption avatar Jun 14 '22 18:06 cyberkryption

@cyberkryption thank you for your report, a couple of things:

  • setting OS to report ES version is not necessary any more
  • having it in there now, try to configure the elastic version in graylog using elasticsearch_version = 7 and see if it makes a difference for you - because version probing should be skipped that way, too.
  • at a first glance, elasticsearch_disable_version_check might actually be buggy
  • still, what makes you sure that the hostname check is not part of the ssl handshake and will still persist even after disabling the version check?

I'll setup a test install and try to verify what you see, but this will take a little more time

janheise avatar Jun 17 '22 10:06 janheise

I wil try seeting the elasticsearch_version=7 in the graylog server.conf.

I want to confirm that the hostname check whether the hostname check is part of Elasticsearch version check. If it is not and problem still persists, it would be strange as the host os is reporting that all certificates etc are good.

I will reconfigure and report back.

cyberkryption avatar Jun 17 '22 14:06 cyberkryption

OK, I rechecked.

I set elasticsearch_rversion=7 and elasticsearch_disable_version_check = true in server.conf I also set tried setting hostanme to fqdn of server using hostnamectl but I still get tyhe following in the graylog server logs. Hostnamectl output

 hostnamectl
   Static hostname: opensearch.cyberkryption.local
         Icon name: computer-vm
           Chassis: vm
        Machine ID: e94d9047cfcb423aaa6f996b01e60de5
           Boot ID: 30ae6365c0e145f59de9a9ed81f4647f
    Virtualization: vmware
  Operating System: CentOS Linux 7 (Core)
       CPE OS Name: cpe:/o:centos:centos:7
            Kernel: Linux 3.10.0-1160.66.1.el7.x86_64
      Architecture: x86-64

2022-06-14T19:30:33.274+01:00 INFO  [CmdLineTool] Running with JVM arguments: -Xms1g -Xmx1g -XX:NewRatio=1 -XX:+ResizeTLAB -XX:-OmitStackTraceInFastThrow -Djdk.tls.acknowledgeCloseNotify=true -Dlog4j2.formatMsgNoLookups=true -XX:+UseConcMarkSweepGC -XX:+CMSConcurrentMTEnabled -XX:+CMSClassUnloadingEnabled -Dlog4j.configurationFile=file:///etc/graylog/server/log4j2.xml -Djava.library.path=/usr/share/graylog-server/lib/sigar -Dgraylog2.installation_source=rpm
2022-06-14T19:30:33.662+01:00 INFO  [cluster] Cluster created with settings {hosts=[localhost:27017], mode=SINGLE, requiredClusterType=UNKNOWN, serverSelectionTimeout='30000 ms', maxWaitQueueSize=5000}
2022-06-14T19:30:33.704+01:00 INFO  [cluster] Cluster description not yet available. Waiting for 30000 ms before timing out
2022-06-14T19:30:33.730+01:00 INFO  [connection] Opened connection [connectionId{localValue:1, serverValue:1}] to localhost:27017
2022-06-14T19:30:33.737+01:00 INFO  [cluster] Monitor thread successfully connected to server with description ServerDescription{address=localhost:27017, type=STANDALONE, state=CONNECTED, ok=true, version=ServerVersion{versionList=[4, 2, 20]}, minWireVersion=0, maxWireVersion=8, maxDocumentSize=16777216, logicalSessionTimeoutMinutes=30, roundTripTimeNanos=2739143}
2022-06-14T19:30:33.749+01:00 INFO  [connection] Opened connection [connectionId{localValue:2, serverValue:2}] to localhost:27017
2022-06-14T19:30:33.775+01:00 INFO  [connection] Closed connection [connectionId{localValue:2, serverValue:2}] to localhost:27017 because the pool has been closed.
2022-06-14T19:30:33.777+01:00 INFO  [MongoDBPreflightCheck] Connected to MongoDB version 4.2.20
2022-06-14T19:30:34.029+01:00 ERROR [VersionProbe] Unable to retrieve version from Elasticsearch node: Hostname opensearch.cyberkryption.local not verified:
    certificate: sha256/KeB6DfLCNFq3561pPhy8Zc/+oU6pmSySnrPyzHbwfvQ=
    DN: CN=opensearch.cyberkryption.local, OU=Security, O=DefenceLogic, L=St.Helier, ST=Jersey, C=GB
    subjectAltNames: []. - Hostname opensearch.cyberkryption.local not verified:
    certificate: sha256/KeB6DfLCNFq3561pPhy8Zc/+oU6pmSySnrPyzHbwfvQ=
    DN: CN=opensearch.cyberkryption.local, OU=Security, O=DefenceLogic, L=St.Helier, ST=Jersey, C=GB
    subjectAltNames: [].
2022-06-14T19:30:34.030+01:00 INFO  [VersionProbe] Elasticsearch is not available. Retry #1

My understanding of ERROR [VersionProbe] indicates that it is something to do with version checking?

That is why I went down the path of trying to disable ES version check.

Any help appreciated.

cyberkryption avatar Jun 17 '22 15:06 cyberkryption

Thanks for checking/confirming. I'll try to reproduce it.

janheise avatar Jun 17 '22 17:06 janheise

I can upload the exported vm for you to download and save you config time if you want.

cyberkryption avatar Jun 18 '22 15:06 cyberkryption

Hello @cyberkryption,

First of all, I think that the real issue here is that the Graylog server can't verify certificate of the Opensearch server. Are you sure that the graylog process has correctly configured a java truststore, which contains the certificate used in opensearch?

As @janheise mentioned, even if you would be able to skip the version check, you would be unable to communicate with your OS instance, because the SSL error blocks all communication with the instance. You would get similar exception elsewhere. The version check only triggers a simple HTTPS request to the configured OS address and tries to read the opensearch version from the json response. It's the same type of HTTP communication as all other requests between Graylog and Opensearch.

If you want to test your setup without SSL enabled between Graylog and Opensearch, you can disable it by configuring the plugins.security.ssl.http.enabled option in OS conf.

If you still want to try to disable the version check and see what happens: the error from the stacktrace is not coming from the ESVersionCheckPeriodical as we originally assumed, but from the SearchDbPreflightCheck, which tries to verify that a valid search instance is available. You can try to disable all pre-flight checks by setting skip_preflight_checks to true in the Graylog configuration. But again, I think you'll just see the very same SSL error from a different part of the app, as there will be no communication possible with your OS instance.

Best regards, Tomas

todvora avatar Jun 24 '22 06:06 todvora

Hi Tomas,

An update. I have checked the certificates are in the cacerts trust store. I imported them in .pem format as below.

[cyberkryption@opensearch certificates]$ sudo keytool -list -alias dlrootca -cacerts
Enter keystore password:
dlrootca, 24 Jun 2022, trustedCertEntry,
Certificate fingerprint (SHA-256): 94:56:A4:8B:DA:B0:AB:91:78:D0:6A:06:7D:0A:B2:20:D7:A1:B8:1B:E2:4D:1A:F0:5D:17:06:41:1E:75:A1:97
[cyberkryption@opensearch certificates]$  sudo keytool -list -alias opensearch -cacerts
Enter keystore password:
opensearch, 24 Jun 2022, trustedCertEntry,
Certificate fingerprint (SHA-256): A7:5F:C4:22:34:F5:A1:0B:6D:86:2F:9A:73:FF:1E:92:29:4F:80:00:42:EA:3B:11:B9:D0:E8:C1:86:05:7F:73
[cyberkryption@opensearch certificates]$

I started graylog and the server still failed to boot past the ES check.

Next , i set skip_preflight_checks to true in graylog configuration file.

2022-06-24T12:06:20.691+01:00 INFO  [JerseyService] Started REST API at <opensearch.cyberkryption.local:9000>
2022-06-24T12:06:20.692+01:00 INFO  [ServiceManagerListener] Services are healthy
2022-06-24T12:06:20.692+01:00 INFO  [InputSetupService] Triggering launching persisted inputs, node transitioned from Uninitialized [LB:DEAD] to Running [LB:ALIVE]
2022-06-24T12:06:20.693+01:00 INFO  [ServerBootstrap] Services started, startup times in ms: {FailureHandlingService [RUNNING]=1, UserSessionTerminationService [RUNNING]=16, GracefulShutdownService [RUNNING]=22, InputSetupService [RUNNING]=28, MongoDBProcessingStatusRecorderService [RUNNING]=52, LocalKafkaMessageQueueWriter [RUNNING]=61, ConfigurationEtagService [RUNNING]=77, PrometheusExporter [RUNNING]=78, EtagService [RUNNING]=79, OutputSetupService [RUNNING]=79, BufferSynchronizerService [RUNNING]=80, JobSchedulerService [RUNNING]=80, StreamCacheService [RUNNING]=81, LocalKafkaMessageQueueReader [RUNNING]=82, UrlWhitelistService [RUNNING]=94, LocalKafkaJournal [RUNNING]=97, LookupTableService [RUNNING]=116, PeriodicalsService [RUNNING]=131, JerseyService [RUNNING]=1506}
2022-06-24T12:06:20.694+01:00 INFO  [ServerBootstrap] Graylog server up and running.

I sent a test message in using the following

echo {"message":"Hello from the tcp stack","host":"cyberkryption023"} | ncat 192.168.1.234 12201

It appears to be working

image

Can you point me to what checks are disabled as a result of skip_preflight_checks to true?

cyberkryption avatar Jun 24 '22 11:06 cyberkryption

I have this issue as well, converted a Debian 10 graylog 4.3 instance from elasticsearch-oss to opensearch, now it will not start.

update: I figured out my issue, at least in my case.

opensearch was binding itself to 127.0.1.1:9200. instead of the default 127.0.0.1:9200. I had to update the ES binding in /etc/graylog/server/server.conf:

elasticsearch_hosts = http://127.0.1.1:9200

luckman212 avatar Jun 24 '22 15:06 luckman212

@luckman212 Mine is binding to an IP address in my local network.

curl -XGET https://opensearch.cyberkryption.local:9200 -u 'admin:admin' --insecure
{
  "name" : "opensearch",
  "cluster_name" : "graylog",
  "cluster_uuid" : "F_m6D20qSRm9ttiF0ySuBw",
  "version" : {
    "number" : "7.10.2",
    "build_type" : "rpm",
    "build_hash" : "6febcf7b53ff189de767e460e905e9e5aeecc8cb",
    "build_date" : "2022-05-04T03:59:23.756957Z",
    "build_snapshot" : false,
    "lucene_version" : "8.10.1",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "The OpenSearch Project: https://opensearch.org/"
}
[cyberkryption@opensearch ~]$ cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.1.234 opensearch.cyberkryption.local opensearch

cyberkryption avatar Jun 24 '22 17:06 cyberkryption

There are currently 3 pre-flight checks:

  • disk journal verification (enough space, writable directory, sizing)
  • Elasticsearch / Opensearch availability and compatibility
  • Mongodb availability and compatibility

With the skip_preflight_checks you enable/disable all of them at once. What surprises me is that you can actually communicate with the OS instance if you skip the checks. I also asked @mpfz0r if he can check your report and maybe he has same additional ideas.

todvora avatar Jun 27 '22 13:06 todvora

Hi @cyberkryption

I think the problem is that your certificate does not have any subjectAltNames configured. You can also see this in the logs:

    certificate: sha256/KeB6DfLCNFq3561pPhy8Zc/+oU6pmSySnrPyzHbwfvQ=
    DN: CN=opensearch.cyberkryption.local, OU=Security, O=DefenceLogic, L=St.Helier, ST=Jersey, C=GB
    subjectAltNames: []. - Hostname opensearch.cyberkryption.local not verified:

We are using OkHttp for the VersionProbe, which does not look at the certs CN: https://github.com/square/okhttp/issues/4966

If you disable the version check entirely, only the elastic client will be used. It seems elastic uses the apache http client, which still cares about the CN.

I'd suggest to recreate the certificates using something like openssl -addext "subjectAltName = DNS:opensearch.cyberkryption.local"

mpfz0r avatar Jun 28 '22 09:06 mpfz0r

I also filed a bug over at OpenSearch to see whether they can improve their documentation: https://github.com/opensearch-project/documentation-website/issues/730

mpfz0r avatar Jun 28 '22 09:06 mpfz0r

@cyberkryption did you have a chance to recreate your certificates? Can we close this ticket?

mpfz0r avatar Jul 13 '22 08:07 mpfz0r

I faced that issue with fresh graylog 4.3.5-1 installation with elasticsearch-oss 7.10.2 and elasticsearch x-pack 7.10.2 as well. The same error messages in the log where seen like mentionend in https://github.com/Graylog2/graylog2-server/issues/12897#issuecomment-1159001173. In running enviroments with graylog 4.2 i can not expierence those issue after an upgrade.

I can confirm the setting skip_preflight_checks= True as a workaround.

xtruthx avatar Aug 12 '22 07:08 xtruthx

Hi Marco,

Please close ticket as i wont have the time for a few weeks to retest.

Cyberkryption

On Wed, 13 Jul 2022, 09:10 Marco Pfatschbacher, @.***> wrote:

@cyberkryption https://github.com/cyberkryption did you have a chance to recreate your certificates? Can we close this ticket?

— Reply to this email directly, view it on GitHub https://github.com/Graylog2/graylog2-server/issues/12897#issuecomment-1182906994, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAWYQ3DJJZX2VLKI33GFXJTVTZ2ZDANCNFSM5YYV4L3Q . You are receiving this because you were mentioned.Message ID: @.***>

cyberkryption avatar Aug 12 '22 08:08 cyberkryption

@xtruthx could you show me the output of openssl x509 -text -in your-elastic-ssl-cert.pem ?

mpfz0r avatar Aug 16 '22 09:08 mpfz0r

I'm considering this resolved. Please open a new ticket if that's not the case.

mpfz0r avatar Nov 08 '23 20:11 mpfz0r