ml-commons
ml-commons copied to clipboard
[BUG] Tag Mismatch error on VisualizationsToolIT.testVisualizationFound Windows Test
What is the bug? https://github.com/opensearch-project/ml-commons/blob/85d0c9e2b8807162de9afe7c915801b75e486064/plugin/src/test/java/org/opensearch/ml/tools/VisualizationsToolIT.java#L59-L66 There is a retry enabled on the VisualizationsToolIT.testVisualizationFound test but it seems that retry has a bit of a flaw if the underlying problem is different I am seeing that the problem here is a encryption issue. This might be the source of all of our flaky tests
VisualizationsToolIT > testVisualizationFound FAILED
org.opensearch.client.ResponseException: method [POST], host [http://127.0.0.1:54529/], URI [/_plugins/_ml/agents/bLjaRJQB515KRnslfdUv/_execute], status line [HTTP/1.1 500 Internal Server Error]
{"status":500,"error":{"type":"AEADBadTagException","reason":"System Error","details":"Tag mismatch"}}
at app//org.opensearch.client.RestClient.convertResponse(RestClient.java:501)
at app//org.opensearch.client.RestClient.performRequest(RestClient.java:384)
at app//org.opensearch.client.RestClient.performRequest(RestClient.java:359)
at app//org.opensearch.ml.utils.TestHelper.makeRequest(TestHelper.java:182)
at app//org.opensearch.ml.utils.TestHelper.makeRequest(TestHelper.java:155)
at app//org.opensearch.ml.utils.TestHelper.makeRequest(TestHelper.java:144)
at app//org.opensearch.ml.tools.VisualizationsToolIT.testVisualizationFound(VisualizationsToolIT.java:74)
java.lang.AssertionError: The response failed to meet condition after 5 attempts. Attempted to perform GET : /_plugins/_ml/models/arjaRJQB515KRnsleNWv
at org.junit.Assert.fail(Assert.java:89)
at org.opensearch.ml.tools.ToolIntegrationWithLLMTest.waitResponseMeetingCondition(ToolIntegrationWithLLMTest.java:103)
at org.opensearch.ml.tools.ToolIntegrationWithLLMTest.checkForModelUndeployedStatus(ToolIntegrationWithLLMTest.java:89)
at org.opensearch.ml.tools.ToolIntegrationWithLLMTest.deleteModel(ToolIntegrationWithLLMTest.java:74)
at
...
2> REPRODUCE WITH: gradlew ':opensearch-ml-plugin:integTest' --tests "org.opensearch.ml.tools.VisualizationsToolIT.testVisualizationFound" -Dtests.seed=AD7A0603B7C68274 -Dtests.security.manager=false -Dtests.locale=fr-GN -Dtests.timezone=America/Argentina/Buenos_Aires -Druntime.java=21
2> org.opensearch.client.ResponseException: method [POST], host [http://127.0.0.1:54529/], URI [/_plugins/_ml/agents/bLjaRJQB515KRnslfdUv/_execute], status line [HTTP/1.1 500 Internal Server Error]
{"status":500,"error":{"type":"AEADBadTagException","reason":"System Error","details":"Tag mismatch"}}
at app//org.opensearch.client.RestClient.convertResponse(RestClient.java:501)
at app//org.opensearch.client.RestClient.performRequest(RestClient.java:384)
at app//org.opensearch.client.RestClient.performRequest(RestClient.java:359)
at app//org.opensearch.ml.utils.TestHelper.makeRequest(TestHelper.java:182)
at app//org.opensearch.ml.utils.TestHelper.makeRequest(TestHelper.java:155)
at app//org.opensearch.ml.utils.TestHelper.makeRequest(TestHelper.java:144)
at app//org.opensearch.ml.tools.VisualizationsToolIT.testVisualizationFound(VisualizationsToolIT.java:74)
java.lang.AssertionError: The response failed to meet condition after 5 attempts. Attempted to perform GET : /_plugins/_ml/models/arjaRJQB515KRnsleNWv
at org.junit.Assert.fail(Assert.java:89)
at org.opensearch.ml.tools.ToolIntegrationWithLLMTest.waitResponseMeetingCondition(ToolIntegrationWithLLMTest.java:103)
at org.opensearch.ml.tools.ToolIntegrationWithLLMTest.checkForModelUndeployedStatus(ToolIntegrationWithLLMTest.java:89)
at org.opensearch.ml.tools.ToolIntegrationWithLLMTest.deleteModel(ToolIntegrationWithLLMTest.java:74)
at
How can one reproduce the bug? This was discovered in a build failure.
What is the expected behavior? This test should pass or timeout but not have this encryption issue.
seeing VisualizationIT failing again but with different error. https://github.com/opensearch-project/ml-commons/pull/3353
seeing VisualizationIT failing again but with different error. #3353
Hmm This is confusing I would think that the retry would help but like you said here it didnt help. Its clearly failing even when the retries are according to how many nodes there are. If only there was some way to dump all possible info and configuration when this happens
Hey @Hailong-am do you mind taking a look? Thanks
Catch All Triage - 1, 2, 3
Hey @Hailong-am do you mind taking a look? Thanks
do you have the link or the logs for this failure?
Hey Hailong, we are trying to get to paste the stack traces with reproduction line too. Thankfully this build failure log didn't expire. Can you take a look?
Adding the log here in txt format so it doesn't expire
6_Build and Test MLCommons Plugin on linux (21).txt
Here is another example of another build failure. Linking the txt file here as well to make sure it does not expire.
@Hailong-am did you get a chance to look at this? any update?
@Hailong-am did you get a chance to look at this? any update?
by looking the logs attached
[testVisualizationNotFound] The 6-th attempt on GET:/_plugins/_ml/models/UNT-S5QBXi7OW4I7mZRp . response: Response{requestLine=GET /_plugins/_ml/models/UNT-S5QBXi7OW4I7mZRp HTTP/1.1, host=http://[::1]:38269, response=HTTP/1.1 200 OK}
Tag mismatch error happened at model deploy phrase which is not get model api. so i assume Tag mismatch error is not the cause of the flaky test for this time.
we may need add some logs to see what's the actual response body for get model api
Suppressed: javax.crypto.AEADBadTagException: Tag mismatch
2025-01-09T16:56:16.8490050Z » at java.base/com.sun.crypto.provider.GaloisCounterMode$GCMDecrypt.doFinal(GaloisCounterMode.java:1545) ~[?:?]
2025-01-09T16:56:16.8491643Z » at java.base/com.sun.crypto.provider.GaloisCounterMode.engineDoFinal(GaloisCounterMode.java:417) ~[?:?]
2025-01-09T16:56:16.8492770Z » at java.base/javax.crypto.Cipher.doFinal(Cipher.java:2244) ~[?:?]
2025-01-09T16:56:16.8494108Z » at com.amazonaws.encryptionsdk.internal.JceKeyCipher.decryptKey(JceKeyCipher.java:129) ~[aws-encryption-sdk-java-2.4.1.jar:?]
2025-01-09T16:56:16.8495801Z » at com.amazonaws.encryptionsdk.jce.JceMasterKey.decryptDataKey(JceMasterKey.java:165) ~[aws-encryption-sdk-java-2.4.1.jar:?]
2025-01-09T16:56:16.8497882Z » at com.amazonaws.encryptionsdk.DefaultCryptoMaterialsManager.decryptMaterials(DefaultCryptoMaterialsManager.java:118) ~[aws-encryption-sdk-java-2.4.1.jar:?]
2025-01-09T16:56:16.8500076Z » at com.amazonaws.encryptionsdk.internal.DecryptionHandler.readHeaderFields(DecryptionHandler.java:621) ~[aws-encryption-sdk-java-2.4.1.jar:?]
2025-01-09T16:56:16.8502066Z » at com.amazonaws.encryptionsdk.internal.DecryptionHandler.<init>(DecryptionHandler.java:111) ~[aws-encryption-sdk-java-2.4.1.jar:?]
2025-01-09T16:56:16.8503830Z » at com.amazonaws.encryptionsdk.internal.DecryptionHandler.create(DecryptionHandler.java:302) ~[aws-encryption-sdk-java-2.4.1.jar:?]
2025-01-09T16:56:16.8505549Z » at com.amazonaws.encryptionsdk.AwsCrypto.decryptData(AwsCrypto.java:511) ~[aws-encryption-sdk-java-2.4.1.jar:?]
2025-01-09T16:56:16.8507014Z » at com.amazonaws.encryptionsdk.AwsCrypto.decryptData(AwsCrypto.java:502) ~[aws-encryption-sdk-java-2.4.1.jar:?]
2025-01-09T16:56:16.8508505Z » at com.amazonaws.encryptionsdk.AwsCrypto.decryptData(AwsCrypto.java:476) ~[aws-encryption-sdk-java-2.4.1.jar:?]
2025-01-09T16:56:16.8510156Z » at org.opensearch.ml.engine.encryptor.EncryptorImpl.decrypt(EncryptorImpl.java:97) ~[opensearch-ml-algorithms-2.19.0.0-SNAPSHOT.jar:?]
2025-01-09T16:56:16.8512230Z » at org.opensearch.ml.engine.algorithms.remote.RemoteModel.lambda$initModel$0(RemoteModel.java:104) ~[opensearch-ml-algorithms-2.19.0.0-SNAPSHOT.jar:?]
2025-01-09T16:56:16.8514467Z » at org.opensearch.ml.common.connector.HttpConnector.decrypt(HttpConnector.java:366) ~[opensearch-ml-common-2.19.0.0-SNAPSHOT.jar:?]
2025-01-09T16:56:16.8516333Z » at org.opensearch.ml.engine.algorithms.remote.RemoteModel.initModel(RemoteModel.java:104) [opensearch-ml-algorithms-2.19.0.0-SNAPSHOT.jar:?]
2025-01-09T16:56:16.8518000Z » at org.opensearch.ml.engine.MLEngine.deploy(MLEngine.java:139) [opensearch-ml-algorithms-2.19.0.0-SNAPSHOT.jar:?]
2025-01-09T16:56:16.8519744Z » ```
Thanks for the update! Would you be willing to take that up? (adding logs)
Thanks for the update! Would you be willing to take that up? (adding logs)
sure, i will do two things. First add some logs to log response body, second continuing try in my local to see whether i can reproduce the error.
Thank you!
@Hailong-am Do you have any update on this? Can I assign this issue to you?
@Hailong-am Do you have any update on this? Can I assign this issue to you?
The logs has been added, do we still face this issue? If not we can close it and open a new one with latest github action run logs
Close it, please reopen if we still face the issue