OpenSearch icon indicating copy to clipboard operation
OpenSearch copied to clipboard

[Tracing Instrumentation] Fix for null connection test cases

Open Gaganjuneja opened this issue 2 years ago • 18 comments

Description

Anomaly-detection has some test cases which passes null connection and those are getting NullPointerException which was earlier handled by the listener. Adding the null connection check in the SpanName builder.

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

Check List

  • [ ] New functionality includes testing.
    • [ ] All tests pass
  • [ ] New functionality has been documented.
    • [ ] New functionality has javadoc added
  • [x] Commits are signed per the DCO using --signoff
  • [x] Commit changes are listed out in CHANGELOG.md file (See: Changelog)
  • [ ] GitHub issue/PR created in OpenSearch documentation repo for the required public documentation changes (#[Issue/PR number])

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Gaganjuneja avatar Oct 05 '23 17:10 Gaganjuneja

@reta, Please take a look, Its breaking the Anomaly detection plugin's build.

Gaganjuneja avatar Oct 05 '23 17:10 Gaganjuneja

@Gaganjuneja -- Do you think it makes sense to fix it this way, or should AD just not pass a null connection?

msfroh avatar Oct 05 '23 17:10 msfroh

@Gaganjuneja -- Do you think it makes sense to fix it this way, or should AD just not pass a null connection?

Ideally yes AD shouldn't pass the null connection but may be there are some test cases which may want to test if their listener::handleException gets called if the connection is null. I guess there is no harm putting up this additional check. What's your thought on this?

Gaganjuneja avatar Oct 05 '23 17:10 Gaganjuneja

@Gaganjuneja -- Do you think it makes sense to fix it this way, or should AD just not pass a null connection?

Ideally yes AD shouldn't pass the null connection but may be there are some test cases which may want to test if their listener::handleException gets called if the connection is null. I guess there is no harm putting up this additional check. What's your thought on this?

AD should anyways fix this test as its not working as expected.

Gaganjuneja avatar Oct 05 '23 17:10 Gaganjuneja

I just talked w/ @amitgalitz, and we have a fix for the AD test. Essentially, they had one test creating a mock TransportService, while every other test creates a "real" TransportService that does nothing.

We're able to fix that one test to behave just like all the others.

msfroh avatar Oct 05 '23 18:10 msfroh

I just talked w/ @amgalitz, and we have a fix for the AD test. Essentially, they had one test creating a mock TransportService, while every other test creates a "real" TransportService that does nothing.

We're able to fix that one test to behave just like all the others.

Thank you Michael for the help here! It indeed help fix the issue in AD.

I wonder though on core side if having a null connection check is only relevant in test cases or there could be other issues with previously having null connection get handled on the sendRequest call vs having it cause an exception now

amitgalitz avatar Oct 05 '23 18:10 amitgalitz

I don't object to the null-check, but on the other hand, part of me likes the idea that the possible NPE highlights other misconfigured things.

msfroh avatar Oct 05 '23 18:10 msfroh

I don't object to the null-check, but on the other hand, part of me likes the idea that the possible NPE highlights other misconfigured things.

:100:

reta avatar Oct 05 '23 18:10 reta

Gradle Check (Jenkins) Run Completed with:

  • RESULT: FAILURE :x:
  • URL: https://build.ci.opensearch.org/job/gradle-check/27151/
  • CommitID: 0d83a0dbd545a15ca7e24b6b67ebaa5b0fae4bf2 Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Oct 05 '23 18:10 github-actions[bot]

Compatibility status:

Checks if related components are compatible with change 17f2952

Incompatible components

Incompatible components: [https://github.com/opensearch-project/custom-codecs.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/flow-framework.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/performance-analyzer.git]

Skipped components

Compatible components

Compatible components: [https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/sql.git]

github-actions[bot] avatar Oct 05 '23 18:10 github-actions[bot]

I don't object to the null-check, but on the other hand, part of me likes the idea that the possible NPE highlights other misconfigured things.

💯

Totally agreed, In case needed please feel free to merge this PR :)

Gaganjuneja avatar Oct 05 '23 18:10 Gaganjuneja

Gradle Check (Jenkins) Run Completed with:

  • RESULT: FAILURE :x:
  • URL: https://build.ci.opensearch.org/job/gradle-check/27152/
  • CommitID: 17f2952554988415399fea24aa7b6bea39076c4f Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Oct 05 '23 18:10 github-actions[bot]

This PR is stalled because it has been open for 30 days with no activity.

Hi @Gaganjuneja, Is this being worked upon? Feel free to reach out to maintainers for further reviews.

ticheng-aws avatar Jan 06 '24 00:01 ticheng-aws

This PR is stalled because it has been open for 30 days with no activity.

Looking into the code around this area, seems like connection is always expected to be non-null (Ref here). So probably adding a check Objects.nonNull(connection, "Input connection is expected to be non-null" ) in the createSpanName would help to catch such issues (mostly occurring in tests). Currently the PR is ignoring the null connection and expecting the lower layer to handle it or throw the error. Thoughts ?

sohami avatar Feb 14 '24 21:02 sohami

Looking into the code around this area, seems like connection is always expected to be non-null (Ref here).

The connection should never be null indeed

reta avatar Feb 14 '24 23:02 reta

Looking into the code around this area, seems like connection is always expected to be non-null (Ref here).

The connection should never be null indeed

+1, This was coming null from test cases.

Gaganjuneja avatar Feb 15 '24 02:02 Gaganjuneja

@Gaganjuneja what do you want to do with this?

If this can't be null maybe the right thing to do is an assert. Otherwise add a test case for the null scenario.

dblock avatar Feb 27 '24 17:02 dblock

I am closing this PR. This is not needed as fixed in the AD plugin.

Gaganjuneja avatar Mar 01 '24 02:03 Gaganjuneja

@Gaganjuneja I think this was a valid case though, being defensive in this code was a good idea. Consider finishing it?

dblock avatar Mar 01 '24 21:03 dblock

Sure, will cleanup the code. Thanks @dblock

Gaganjuneja avatar Mar 10 '24 05:03 Gaganjuneja

:x: Gradle check result for 17f2952554988415399fea24aa7b6bea39076c4f: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Mar 10 '24 05:03 github-actions[bot]

This PR is stalled because it has been open for 30 days with no activity.

@Gaganjuneja Are you planning to continue on this PR and targeting any upcoming release? This has been stalled now, and if this is not planned to be continued, lets close this.

mgodwan avatar Apr 23 '24 08:04 mgodwan

This PR is stalled because it has been open for 30 days with no activity.