elasticsearch icon indicating copy to clipboard operation
elasticsearch copied to clipboard

The new 'capabilities' API does not appear to work with mixed cluster tests

Open craigtaverner opened this issue 9 months ago • 7 comments

Recently, in https://github.com/elastic/elasticsearch/pull/108464, we added support for capabilities in ES|QL csv-spec tests, and on top of that switched all existing csv-spec tests to use the capabilities in https://github.com/elastic/elasticsearch/pull/108570. However, under the hood this was still defining NodeFeature instances and exposing them as capabilities.

The idea is that we should stop adding NodeFeatures and instead only add capabilities. I attempted this in the union-types PR at https://github.com/elastic/elasticsearch/pull/107545, but the consequences were that all mixed cluster tests over the union-types.csv-spec failed with the error:

java.lang.IllegalArgumentException: Unknown feature esql.union_types:
  check the respective FeatureSpecification is provided both in module-info.java
  as well as in META-INF/services and verify the module is loaded during tests.

No-where in the code-base was the string esql.union_types, so I assume this is being constructed as a virtual feature from the capability union_types which was defined as:

public static final String UNION_TYPES = "union_types";

The above error is reproducible with:

./gradlew ':x-pack:plugin:esql:qa:server:mixed-cluster:v8.13.5#javaRestTest' -Dtests.class="org.elasticsearch.xpack.esql.qa.mixed.MixedClusterEsqlSpecIT" -Dtests.method="test {union_types.MultiIndexIpString ASYNC}" -Dtests.seed=67348C5357C06456 -Dtests.bwc=true -Dtests.locale=hr -Dtests.timezone=Asia/Rangoon -Druntime.java=21

As an experiment, I reverted the union_types capability back to a NodeFeature and the tests pass. Since the goal of these capabilities is to not keep adding NodeFeatures, because they increase the cluster state, this would appear to indicate that this goal is not being achieved.

I did see a comment that the PR at https://github.com/elastic/elasticsearch/pull/108425 would improve mixed-cluster support, but that work was included in the above tests, and merged before the csv-spec PRs.

To see how I added union_types to the capabilities, instead of the NodeFeatures, take a look at the commit where I removed it again to get things working: https://github.com/elastic/elasticsearch/pull/107545/commits/7cd3104f5280f274330aed4e1a3c47cf773fd835

craigtaverner avatar May 15 '24 11:05 craigtaverner

Pinging @elastic/es-analytical-engine (Team:Analytics)

elasticsearchmachine avatar May 15 '24 11:05 elasticsearchmachine

This looks like an issue in ESRestTestFeatureService here. Rather than just returning false (and so disabling the test via EsqlSpecTestCase), it's actually throwing an exception.

@mosche this exception is thrown because the feature service is assuming that if you are checking for a feature, the feature should be registered somewhere, yes?

thecoop avatar May 15 '24 12:05 thecoop

Pinging @elastic/es-core-infra (Team:Core/Infra)

elasticsearchmachine avatar May 15 '24 12:05 elasticsearchmachine

@mosche this exception is thrown because the feature service is assuming that if you are checking for a feature, the feature should be registered somewhere, yes?

This fallback thing that the csv-spec tests do was so we don't have to support two kinds of syntax. If we want to keep the exception behavior we could tweak the fallback on our side.

nik9000 avatar May 15 '24 12:05 nik9000

@mosche this exception is thrown because the feature service is assuming that if you are checking for a feature, the feature should be registered somewhere, yes?

That assertion is pretty important, it protects against checking for misspelled features or cases where the feature is simply missing because the necessary module is not loaded. Without this we would silently skip the test and everything looks green and healthy.

mosche avatar May 15 '24 12:05 mosche

Agree - I think this here may be the only use case where it's valid to check for features that are not registered. If more come up as capabilities are used more, then we can re-evaluate.

@nik9000 could you rewrite the fallback so that it doesn't hit this exception?

thecoop avatar May 15 '24 12:05 thecoop

@nik9000 could you rewrite the fallback so that it doesn't hit this exception?

@ivancea has code that does pretty much this already: https://github.com/elastic/elasticsearch/pull/108574/files/81a949a2ad6d245707dec268ed133acc10511fcd#diff-e5229c7428590a22294db189f7845ae0fadefc37b2154c5da1f1dc85804bcaecR190

It's in a PR that's just blocked on this bug, so I think we'll just get it in.

nik9000 avatar May 15 '24 12:05 nik9000